AI Face Fusion

AI Face Fusion — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Ugly duckling theorem

    Ugly duckling theorem

    The ugly duckling theorem is an argument showing that classification is not really possible without some sort of bias. More particularly, it assumes finitely many properties combinable by logical connectives, and finitely many objects; it asserts that any two different objects share the same number of (extensional) properties. The theorem is named after Hans Christian Andersen's 1843 story "The Ugly Duckling", because it shows that a duckling is just as similar to a swan as two swans are to each other. It was derived by Satosi Watanabe in 1969. == Mathematical formula == Suppose there are n things in the universe, and one wants to put them into classes or categories. One has no preconceived ideas or biases about what sorts of categories are "natural" or "normal" and what are not. So one has to consider all the possible classes that could be, all the possible ways of making a set out of the n objects. There are 2 n {\displaystyle 2^{n}} such ways, the size of the power set of n objects. One can use that to measure the similarity between two objects, and one would see how many sets they have in common. However, one cannot. Any two objects have exactly the same number of classes in common if we can form any possible class, namely 2 n − 1 {\displaystyle 2^{n-1}} (half the total number of classes there are). To see this is so, one may imagine each class is represented by an n-bit string (or binary encoded integer), with a zero for each element not in the class and a one for each element in the class. As one finds, there are 2 n {\displaystyle 2^{n}} such strings. As all possible choices of zeros and ones are there, any two bit-positions will agree exactly half the time. One may pick two elements and reorder the bits so they are the first two, and imagine the numbers sorted lexicographically. The first 2 n / 2 {\displaystyle 2^{n}/2} numbers will have bit #1 set to zero, and the second 2 n / 2 {\displaystyle 2^{n}/2} will have it set to one. Within each of those blocks, the top 2 n / 4 {\displaystyle 2^{n}/4} will have bit #2 set to zero and the other 2 n / 4 {\displaystyle 2^{n}/4} will have it as one, so they agree on two blocks of 2 n / 4 {\displaystyle 2^{n}/4} or on half of all the cases, no matter which two elements one picks. So if we have no preconceived bias about which categories are better, everything is then equally similar (or equally dissimilar). The number of predicates simultaneously satisfied by two non-identical elements is constant over all such pairs. Thus, some kind of inductive bias is needed to make judgements to prefer certain categories over others. === Boolean functions === Let x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} be a set of vectors of k {\displaystyle k} booleans each. The ugly duckling is the vector which is least like the others. Given the booleans, this can be computed using Hamming distance. However, the choice of boolean features to consider could have been somewhat arbitrary. Perhaps there were features derivable from the original features that were important for identifying the ugly duckling. The set of booleans in the vector can be extended with new features computed as boolean functions of the k {\displaystyle k} original features. The only canonical way to do this is to extend it with all possible Boolean functions. The resulting completed vectors have 2 k {\displaystyle 2^{k}} features. The ugly duckling theorem states that there is no ugly duckling because any two completed vectors will either be equal or differ in exactly half of the features. Proof. Let x and y be two vectors. If they are the same, then their completed vectors must also be the same because any Boolean function of x will agree with the same Boolean function of y. If x and y are different, then there exists a coordinate i {\displaystyle i} where the i {\displaystyle i} -th coordinate of x {\displaystyle x} differs from the i {\displaystyle i} -th coordinate of y {\displaystyle y} . Now the completed features contain every Boolean function on k {\displaystyle k} Boolean variables, with each one exactly once. Viewing these Boolean functions as polynomials in k {\displaystyle k} variables over GF(2), segregate the functions into pairs ( f , g ) {\displaystyle (f,g)} where f {\displaystyle f} contains the i {\displaystyle i} -th coordinate as a linear term and g {\displaystyle g} is f {\displaystyle f} without that linear term. Now, for every such pair ( f , g ) {\displaystyle (f,g)} , x {\displaystyle x} and y {\displaystyle y} will agree on exactly one of the two functions. If they agree on one, they must disagree on the other and vice versa. (This proof is believed to be due to Watanabe.) == Discussion == A possible way around the ugly duckling theorem would be to introduce a constraint on how similarity is measured by limiting the properties involved in classification, for instance, between A and B. However Medin et al. (1993) point out that this does not actually resolve the arbitrariness or bias problem since in what respects A is similar to B: "varies with the stimulus context and task, so that there is no unique answer, to the question of how similar is one object to another". For example, "a barberpole and a zebra would be more similar than a horse and a zebra if the feature striped had sufficient weight. Of course, if these feature weights were fixed, then these similarity relations would be constrained". Yet the property "striped" as a weight 'fix' or constraint is arbitrary itself, meaning: "unless one can specify such criteria, then the claim that categorization is based on attribute matching is almost entirely vacuous". Stamos (2003) remarked that some judgments of overall similarity are non-arbitrary in the sense they are useful: "Presumably, people's perceptual and conceptual processes have evolved that information that matters to human needs and goals can be roughly approximated by a similarity heuristic... If you are in the jungle and you see a tiger but you decide not to stereotype (perhaps because you believe that similarity is a false friend), then you will probably be eaten. In other words, in the biological world stereotyping based on veridical judgments of overall similarity statistically results in greater survival and reproductive success." Unless some properties are considered more salient, or 'weighted' more important than others, everything will appear equally similar, hence Watanabe (1986) wrote: "any objects, in so far as they are distinguishable, are equally similar". In a weaker setting that assumes infinitely many properties, Murphy and Medin (1985) give an example of two putative classified things, plums and lawnmowers: "Suppose that one is to list the attributes that plums and lawnmowers have in common in order to judge their similarity. It is easy to see that the list could be infinite: Both weigh less than 10,000 kg (and less than 10,001 kg), both did not exist 10,000,000 years ago (and 10,000,001 years ago), both cannot hear well, both can be dropped, both take up space, and so on. Likewise, the list of differences could be infinite… any two entities can be arbitrarily similar or dissimilar by changing the criterion of what counts as a relevant attribute." According to Woodward, the ugly duckling theorem is related to Schaffer's Conservation Law for Generalization Performance, which states that all algorithms for learning of boolean functions from input/output examples have the same overall generalization performance as random guessing. The latter result is generalized by Woodward to functions on countably infinite domains.

    Read more →
  • Synthetic data

    Synthetic data

    Synthetic data are artificially generated data not produced by real-world events. Typically created using algorithms, synthetic data can be deployed to validate mathematical models and to train machine learning models. Data generated by a computer simulation can be seen as synthetic data. This encompasses most applications of physical modeling, such as music synthesizers or flight simulators. The output of such systems approximates the real thing, but is fully algorithmically generated. Synthetic data is used in a variety of fields as a filter for information that would otherwise compromise the confidentiality of particular aspects of the data. In many sensitive applications, datasets theoretically exist but cannot be released to the general public; synthetic data sidesteps the privacy issues that arise from using real consumer information without permission or compensation. == Usefulness == Synthetic data is generated to meet specific needs or certain conditions that may not be found in the original, real data. One of the hurdles in applying up-to-date machine learning approaches for complex scientific tasks is the scarcity of labeled data, a gap effectively bridged by the use of synthetic data, which closely replicates real experimental data. This can be useful when designing many systems, from simulations based on theoretical value, to database processors, etc. This helps detect and solve unexpected issues such as information processing limitations. Synthetic data are often generated to represent the authentic data and allows a baseline to be set. Another benefit of synthetic data is to protect the privacy and confidentiality of authentic data, while still allowing for use in testing systems. Computer security experts claim generated synthetic data "... enables us to create realistic behavior profiles for users and attackers. The data is used to train the fraud detection system itself, thus creating the necessary adaptation of the system to a specific environment." In defense and military contexts, synthetic data is seen as a potentially valuable tool to develop and improve complex AI systems, particularly in contexts where high-quality real-world data is scarce. At the same time, synthetic data together with the testing approach can give the ability to model real-world scenarios. == History == Scientific modelling of physical systems has a long history that runs concurrent with the history of physics. For example, research into synthesis of audio and voice can be traced back to the 1930s and before, driven forward by the developments of the telephone and audio recording technologies. Digitization gave rise to software synthesizers from the 1970s onwards. In the context of privacy-preserving statistical analysis, in 1993, the idea of original fully synthetic data was created by Donald Rubin. Rubin originally designed this to synthesize the Decennial Census long form responses for the short form households. He then released samples that did not include any actual long form records - in this he preserved anonymity of the household. Later that year, the idea of original partially synthetic data was created by Little. Little used this idea to synthesize the sensitive values on the public use file. A 1993 work fitted a statistical model to 60,000 MNIST digits, then it was used to generate over 1 million examples. Those were used to train a LeNet-4 to reach state of the art performance. In 1994, Stephen Fienberg introduced 'critical refinement', in which a parametric posterior predictive distribution (instead of a Bayes bootstrap) is used to do the sampling. Later, other important contributors to the development of synthetic data generation were Trivellore Raghunathan, Jerry Reiter, Donald Rubin, John M. Abowd, and Jim Woodcock. Collectively they came up with a solution for how to treat partially synthetic data with missing data. Similarly, they developed the technique of Sequential Regression Multivariate Imputation. == Calculations == Researchers test the framework on synthetic data, which is "the only source of ground truth on which they can objectively assess the performance of their algorithms". Synthetic data can be generated through the use of random lines, having different orientations and starting positions. Datasets can get fairly complicated. A more complicated dataset can be generated by using a synthesizer build. To create a synthesizer build, first use the original data to create a model or equation that fits the data the best. This model or equation will be called a synthesizer build. This build can be used to generate more data. Constructing a synthesizer build involves constructing a statistical model. In a linear regression line example, the original data can be plotted, and a best fit linear line can be created from the data. This line is a synthesizer created from the original data. The next step will be generating more synthetic data from the synthesizer build or from this linear line equation. In this way, the new data can be used for studies and research, and it protects the confidentiality of the original data. David Jensen from the Knowledge Discovery Laboratory explains how to generate synthetic data: "Researchers frequently need to explore the effects of certain data characteristics on their data model." To help construct datasets exhibiting specific properties, such as auto-correlation or degree disparity, proximity can generate synthetic data having one of several types of graph structure: random graphs that are generated by some random process; lattice graphs having a ring structure; lattice graphs having a grid structure, etc. In all cases, the data generation process follows the same process: Generate the empty graph structure. Generate attribute values based on user-supplied prior probabilities. Since the attribute values of one object may depend on the attribute values of related objects, the attribute generation process assigns values collectively. == Applications == === Fraud detection and confidentiality systems === Testing and training fraud detection and confidentiality systems are devised using synthetic data. Specific algorithms and generators are designed to create realistic data, which then assists in teaching a system how to react to certain situations or criteria. For example, intrusion detection software is tested using synthetic data. This data is a representation of the authentic data and may include intrusion instances that are not found in the authentic data. The synthetic data allows the software to recognize these situations and react accordingly. If synthetic data was not used, the software would only be trained to react to the situations provided by the authentic data and it may not recognize another type of intrusion. === Scientific research === Researchers doing clinical trials or any other research may generate synthetic data to aid in creating a baseline for future studies and testing. Real data can contain information that researchers may not want released, so synthetic data is sometimes used to protect the privacy and confidentiality of a dataset. Using synthetic data reduces confidentiality and privacy issues since it holds no personal information and cannot be traced back to any individual. Beyond privacy protection, synthetic data is also being explored for methodological innovation in drug development. For instance, synthetic data may be used to construct synthetic control arms as an alternative to conventional external control arms based on real-world data (RWD) or randomized controlled trials (RCTs). Collectively, regulatory agencies such as the FDA and EMA appear to be at various stages of recognizing and integrating AI-generated synthetic data into their methodologies. While there is growing consensus on the potential of such data to support model development and the broader lifecycle of medicinal products, to date no drug or medical device has been approved using solely or predominantly synthetic data—particularly not as a comparator arm generated entirely via data-driven algorithms. The quality and statistical handling of synthetic data are expected to become more prominent in future regulatory discussions, particularly in contexts such as predictive modeling (e.g., digital twins), where innovative approaches have already been referenced. === Machine learning === Synthetic data is increasingly being used for machine learning applications: a model is trained on a synthetically generated dataset with the intention of transfer learning to real data. Efforts have been made to enable more data science experiments via the construction of general-purpose synthetic data generators, such as the Synthetic Data Vault. In general, synthetic data has several natural advantages: once the synthetic environment is ready, it is fast and cheap to produce as much data as needed; synthetic data can have perfectly accurate labels, including labeling that may be very expensive or impo

    Read more →
  • Read–write conflict

    Read–write conflict

    In computer science, in the field of databases, read–write conflict, also known as unrepeatable reads, is a computational anomaly associated with interleaved execution of transactions. Specifically, a read–write conflict occurs when a "transaction requests to read an entity for which an unclosed transaction has already made a write request." Given a schedule S S = [ T 1 T 2 R ( A ) R ( A ) W ( A ) C o m . R ( A ) W ( A ) C o m . ] {\displaystyle S={\begin{bmatrix}T1&T2\\R(A)&\\&R(A)\\&W(A)\\&Com.\\R(A)&\\W(A)&\\Com.&\end{bmatrix}}} In this example, T1 has read the original value of A, and is waiting for T2 to finish. T2 also reads the original value of A, overwrites A, and commits. However, when T1 reads from A, it discovers two different versions of A, and T1 would be forced to abort, because T1 would not know what to do. This is an unrepeatable read. This could never occur in a serial schedule, in which each transaction executes in its entirety before another begins. Strict two-phase locking (Strict 2PL) or Serializable Snapshot Isolation (SSI) prevent this conflict. == Real-world example == Alice and Bob are using a website to book tickets for a specific show. Only one ticket is left for the specific show. Alice signs on first to see that only one ticket is left, and finds it expensive. Alice takes time to decide. Bob signs on and also finds one ticket left, and orders it instantly. Bob purchases and logs off. Alice decides to buy a ticket, to find there are no tickets. This is a typical read–write conflict situation.

    Read more →
  • Rendezvous hashing

    Rendezvous hashing

    Rendezvous or highest random weight (HRW) hashing is an algorithm that allows clients to achieve distributed agreement on a set of k {\displaystyle k} options out of a possible set of n {\displaystyle n} options. A typical application is when clients need to agree on which sites (or proxies) objects are assigned to. Consistent hashing addresses the special case k = 1 {\displaystyle k=1} using a different method. Rendezvous hashing is both much simpler and more general than consistent hashing (see below). == History == Rendezvous hashing was invented by David Thaler and Chinya Ravishankar at the University of Michigan in 1996. Consistent hashing appeared a year later in the literature. Given its simplicity and generality, rendezvous hashing is now being preferred to consistent hashing in real-world applications. Rendezvous hashing was used very early on in many applications including mobile caching, router design, secure key establishment, and sharding and distributed databases. Other examples of real-world systems that use Rendezvous Hashing include the GitHub load balancer, the Apache Ignite distributed database, the Tahoe-LAFS file store, the CoBlitz large-file distribution service, Apache Druid, IBM's Cloud Object Store, the Arvados Data Management System, Apache Kafka, and the Twitter EventBus pub/sub platform. One of the first applications of rendezvous hashing was to enable multicast clients on the Internet (in contexts such as the MBONE) to identify multicast rendezvous points in a distributed fashion. It was used in 1998 by Microsoft's Cache Array Routing Protocol (CARP) for distributed cache coordination and routing. Some Protocol Independent Multicast routing protocols use rendezvous hashing to pick a rendezvous point. == Problem definition and approach == === Algorithm === Rendezvous hashing solves a general version of the distributed hash table problem: We are given a set of n {\displaystyle n} sites (servers or proxies, say). How can any set of clients, given an object O {\displaystyle O} , agree on a k-subset of sites to assign to O {\displaystyle O} ? The standard version of the problem uses k = 1. Each client is to make its selection independently, but all clients must end up picking the same subset of sites. This is non-trivial if we add a minimal disruption constraint, and require that when a site fails or is removed, only objects mapping to that site need be reassigned to other sites. The basic idea is to give each site S j {\displaystyle S_{j}} a score (a weight) for each object O i {\displaystyle O_{i}} , and assign the object to the highest scoring site. All clients first agree on a hash function h ( ⋅ ) {\displaystyle h(\cdot )} . For object O i {\displaystyle O_{i}} , the site S j {\displaystyle S_{j}} is defined to have weight w i , j = h ( O i , S j ) {\displaystyle w_{i,j}=h(O_{i},S_{j})} . Each client independently computes these weights w i , 1 , w i , 2 … w i , n {\displaystyle w_{i,1},w_{i,2}\dots w_{i,n}} and picks the k sites that yield the k largest hash values. The clients have thereby achieved distributed k {\displaystyle k} -agreement. If a site S {\displaystyle S} is added or removed, only the objects mapping to S {\displaystyle S} are remapped to different sites, satisfying the minimal disruption constraint above. The HRW assignment can be computed independently by any client, since it depends only on the identifiers for the set of sites S 1 , S 2 … S n {\displaystyle S_{1},S_{2}\dots S_{n}} and the object being assigned. HRW easily accommodates different capacities among sites. If site S k {\displaystyle S_{k}} has twice the capacity of the other sites, we simply represent S k {\displaystyle S_{k}} twice in the list, say, as S k , 1 , S k , 2 {\displaystyle S_{k,1},S_{k,2}} . Clearly, twice as many objects will now map to S k {\displaystyle S_{k}} as to the other sites. === Properties === Consider the simple version of the problem, with k = 1, where all clients are to agree on a single site for an object O. Approaching the problem naively, it might appear sufficient to treat the n sites as buckets in a hash table and hash the object name O into this table. Unfortunately, if any of the sites fails or is unreachable, the hash table size changes, forcing all objects to be remapped. This massive disruption makes such direct hashing unworkable. Under rendezvous hashing, however, clients handle site failures by picking the site that yields the next largest weight. Remapping is required only for objects currently mapped to the failed site, and disruption is minimal. Rendezvous hashing has the following properties: Low overhead: The hash function used is efficient, so overhead at the clients is very low. Load balancing: Since the hash function is randomizing, each of the n sites is equally likely to receive the object O. Loads are uniform across the sites. Site capacity: Sites with different capacities can be represented in the site list with multiplicity in proportion to capacity. A site with twice the capacity of the other sites will be represented twice in the list, while every other site is represented once. High hit rate: Since all clients agree on placing an object O into the same site SO, each fetch or placement of O into SO yields the maximum utility in terms of hit rate. The object O will always be found unless it is evicted by some replacement algorithm at SO. Minimal disruption: When a site fails, only the objects mapped to that site need to be remapped. Disruption is at the minimal possible level. Distributed k-agreement: Clients can reach distributed agreement on k sites simply by selecting the top k sites in the ordering. == O(log n) running time via skeleton-based hierarchical rendezvous hashing == The standard version of Rendezvous Hashing described above works quite well for moderate n, but when n {\displaystyle n} is extremely large, the hierarchical use of Rendezvous Hashing achieves O ( log ⁡ n ) {\displaystyle O(\log n)} running time. This approach creates a virtual hierarchical structure (called a "skeleton"), and achieves O ( log ⁡ n ) {\displaystyle O(\log n)} running time by applying HRW at each level while descending the hierarchy. The idea is to first choose some constant m {\displaystyle m} and organize the n {\displaystyle n} sites into c = ⌈ n / m ⌉ {\displaystyle c=\lceil n/m\rceil } clusters C 1 = { S 1 , S 2 … S m } , C 2 = { S m + 1 , S m + 2 … S 2 m } … {\displaystyle C_{1}=\left\{S_{1},S_{2}\dots S_{m}\right\},C_{2}=\left\{S_{m+1},S_{m+2}\dots S_{2m}\right\}\dots } Next, build a virtual hierarchy by choosing a constant f {\displaystyle f} and imagining these c {\displaystyle c} clusters placed at the leaves of a tree T {\displaystyle T} of virtual nodes, each with fanout f {\displaystyle f} . In the accompanying diagram, the cluster size is m = 4 {\displaystyle m=4} , and the skeleton fanout is f = 3 {\displaystyle f=3} . Assuming 108 sites (real nodes) for convenience, we get a three-tier virtual hierarchy. Since f = 3 {\displaystyle f=3} , each virtual node has a natural numbering in octal. Thus, the 27 virtual nodes at the lowest tier would be numbered 000 , 001 , 002 , . . . , 221 , 222 {\displaystyle 000,001,002,...,221,222} in octal (we can, of course, vary the fanout at each level - in that case, each node will be identified with the corresponding mixed-radix number). The easiest way to understand the virtual hierarchy is by starting at the top, and descending the virtual hierarchy. We successively apply Rendezvous Hashing to the set of virtual nodes at each level of the hierarchy, and descend the branch defined by the winning virtual node. We can in fact start at any level in the virtual hierarchy. Starting lower in the hierarchy requires more hashes, but may improve load distribution in the case of failures. For example, instead of applying HRW to all 108 real nodes in the diagram, we can first apply HRW to the 27 lowest-tier virtual nodes, selecting one. We then apply HRW to the four real nodes in its cluster, and choose the winning site. We only need 27 + 4 = 31 {\displaystyle 27+4=31} hashes, rather than 108. If we apply this method starting one level higher in the hierarchy, we would need 9 + 3 + 4 = 16 {\displaystyle 9+3+4=16} hashes to get to the winning site. The figure shows how, if we proceed starting from the root of the skeleton, we may successively choose the virtual nodes ( 2 ) 3 {\displaystyle (2)_{3}} , ( 20 ) 3 {\displaystyle (20)_{3}} , and ( 200 ) 3 {\displaystyle (200)_{3}} , and finally end up with site 74. The virtual hierarchy need not be stored, but can be created on demand, since the virtual nodes names are simply prefixes of base- f {\displaystyle f} (or mixed-radix) representations. We can easily create appropriately sorted strings from the digits, as required. In the example, we would be working with the strings 0 , 1 , 2 {\displaystyle 0,1,2} (at tier 1), 20 , 21 , 22 {\displaystyle 20,21,22} (at tier 2), and 200 , 201 , 202

    Read more →
  • Prequel (mobile application)

    Prequel (mobile application)

    Prequel, Inc. is an American technology company and mobile app developer known for developing the Prequel mobile application, which enables editing photos and videos with filters and effects generated using artificial intelligence. Prequel was founded in 2018 by Serge Aliseenko and Timur Khabirov, who currently serves as the company's CEO. It is headquartered in New York City. As of August 2022, it had been downloaded more than 100 million times. == History == In 2016, entrepreneur Timur Khabirov and investor Serge Aliseenko registered a US corporation named AIAR Labs Inc, which was developing AR solutions as an outsourced contractor. Of several proprietary products, Prequel was selected for beta-testing as a product focused on editing photos and videos. In 2018, Prequel was released on the Apple App Store. The launch cost $3 million USD, financed with the founders’ personal funds. The first release included approximately 10 filters for photos and the same amount of effects that augmented images with rose petals, rain and snow, VHS and film reel simulations, glitch, grain, sun puddles, and lomography. By June 2020, the app had also been released for Android. In 2021, Prequel founders Timur Khabirov and Serge Aliseenko launched a venture studio for startups working with artificial, computer vision, and AR-based visual art. In December 2022, Prequel reached the number 14 slot on the global rankings for Apple App Store’s Top Charts and the number 5 slot on the App Store’s U.S. charts. In March 2023, Prequel launched a new app called Artique, which is an AI-powered image editing app for businesses. Artique provides advertising and marketing graphic design using ready-made templates that users can customize, while giving suggestions and visual cues through artificial intelligence. Prequel was also one of the companies participating in discussions about artificial intelligence at SXSW 2023. == Features == Prequel describes its app as an "Aesthetic Pic Editor. The app uses artificial intelligence to create and edit content. Prequel can be used to touch up faces on images and videos and can also tie various decorative elements to certain points on the human body and face. Prequel filters include the "Cartoon" filter, which converts selfies into cartoon-style pictures. Other filters include Kidcore, Dust, Grain, Fisheye, Retro Style, Miami, Disco, and VHS-style filters, as well as the ability to create Renaissance-style pictures. Prequel also gives users the ability to apply color correction tools and to make moving images with 3D effects out of 2D images. Prequel allows users to take photos and videos directly through the app and apply filters and effects in real time. The app also comes with manual editing options for photos, such as adjusting the brightness and/or exposure and cropping photos, as well as an option to automatically apply adjustments. The Prequel app uses the Core ML, MNN, and TFLight frameworks to work with its neural networks. Some AI solutions are launched server-side, and some on the user's mobile device. A resulting photo or video edited with the app is called "a prequel." The app daily generates over 2 million such prequels, which are published by users in Instagram, TikTok, and other social media. As of 2022, the app has more than 800 filters and effects, along with video templates and support for GIFs and stickers. Prequel is free-to-use, but has a premium version that gives users access to more effects, filters, and beauty tools. Since its launch in 2018, Prequel has been downloaded more than 100 million times.

    Read more →
  • UI data binding

    UI data binding

    UI data binding is a software design pattern to simplify development of GUI applications. UI data binding binds UI elements to an application domain model. Most frameworks employ the Observer pattern as the underlying binding mechanism. To work efficiently, UI data binding has to address input validation and data type mapping. A bound control is a widget whose value is tied or bound to a field in a recordset (e.g., a column in a row of a table). Changes made to data within the control are automatically saved to the database when the control's exit event triggers. == Example == == Data binding frameworks and tools == === Delphi === DSharp third-party data binding tool OpenWire Visual Live Binding - third-party visual data binding tool === Java === JFace Data Binding JavaFX Property === .NET === Windows Forms data binding overview WPF data binding overview Avalonia Unity 3D data binding framework (available in modifications for NGUI, iGUI and EZGUI libraries) === JavaScript === Angular AngularJS Backbone.js Ember.js Datum.js knockout.js Meteor, via its Blaze live update engine OpenUI5 React Vue.js

    Read more →
  • Flajolet–Martin algorithm

    Flajolet–Martin algorithm

    The Flajolet–Martin algorithm is an algorithm for approximating the number of distinct elements in a stream with a single pass and space-consumption logarithmic in the maximal number of possible distinct elements in the stream (the count-distinct problem). The algorithm was introduced by Philippe Flajolet and G. Nigel Martin in their 1984 article "Probabilistic Counting Algorithms for Data Base Applications". Later it has been refined in "LogLog counting of large cardinalities" by Marianne Durand and Philippe Flajolet, and "HyperLogLog: The analysis of a near-optimal cardinality estimation algorithm" by Philippe Flajolet et al. In their 2010 article "An optimal algorithm for the distinct elements problem", Daniel M. Kane, Jelani Nelson and David P. Woodruff give an improved algorithm, which uses nearly optimal space and has optimal O(1) update and reporting times. == The algorithm == Assume that we are given a hash function h a s h ( x ) {\displaystyle \mathrm {hash} (x)} that maps input x {\displaystyle x} to integers in the range [ 0 ; 2 L − 1 ] {\displaystyle [0;2^{L}-1]} , and where the outputs are sufficiently uniformly distributed. Note that the set of integers from 0 to 2 L − 1 {\displaystyle 2^{L}-1} corresponds to the set of binary strings of length L {\displaystyle L} . For any non-negative integer y {\displaystyle y} , define b i t ( y , k ) {\displaystyle \mathrm {bit} (y,k)} to be the k {\displaystyle k} -th bit in the binary representation of y {\displaystyle y} , such that: y = ∑ k ≥ 0 b i t ( y , k ) 2 k . {\displaystyle y=\sum _{k\geq 0}\mathrm {bit} (y,k)2^{k}.} We then define a function ρ ( y ) {\displaystyle \rho (y)} that outputs the position of the least-significant set bit in the binary representation of y {\displaystyle y} , and L {\displaystyle L} if no such set bit can be found as all bits are zero: ρ ( y ) = { min { k ≥ 0 ∣ b i t ( y , k ) ≠ 0 } y > 0 L y = 0 {\displaystyle \rho (y)={\begin{cases}\min\{k\geq 0\mid \mathrm {bit} (y,k)\neq 0\}&y>0\\L&y=0\end{cases}}} Note that with the above definition we are using 0-indexing for the positions, starting from the least significant bit. For example, ρ ( 13 ) = ρ ( 1101 2 ) = 0 {\displaystyle \rho (13)=\rho (1101_{2})=0} , since the least significant bit is a 1 (0th position), and ρ ( 8 ) = ρ ( 1000 2 ) = 3 {\displaystyle \rho (8)=\rho (1000_{2})=3} , since the least significant set bit is at the 3rd position. At this point, note that under the assumption that the output of our hash function is uniformly distributed, then the probability of observing a hash output ending with 2 k {\displaystyle 2^{k}} (a one, followed by k {\displaystyle k} zeroes) is 2 − ( k + 1 ) {\displaystyle 2^{-(k+1)}} , since this corresponds to flipping k {\displaystyle k} heads and then a tail with a fair coin. Now the Flajolet–Martin algorithm for estimating the cardinality of a multiset M {\displaystyle M} is as follows: Initialize a bit-vector BITMAP to be of length L {\displaystyle L} and contain all 0s. For each element x {\displaystyle x} in M {\displaystyle M} : Calculate the index i = ρ ( h a s h ( x ) ) {\displaystyle i=\rho (\mathrm {hash} (x))} . Set B I T M A P [ i ] = 1 {\displaystyle \mathrm {BITMAP} [i]=1} . Let R {\displaystyle R} denote the smallest index i {\displaystyle i} such that B I T M A P [ i ] = 0 {\displaystyle \mathrm {BITMAP} [i]=0} . Estimate the cardinality of M {\displaystyle M} as 2 R / ϕ {\displaystyle 2^{R}/\phi } , where ϕ ≈ 0.77351 {\displaystyle \phi \approx 0.77351} . The idea is that if n {\displaystyle n} is the number of distinct elements in the multiset M {\displaystyle M} , then B I T M A P [ 0 ] {\displaystyle \mathrm {BITMAP} [0]} is accessed approximately n / 2 {\displaystyle n/2} times, B I T M A P [ 1 ] {\displaystyle \mathrm {BITMAP} [1]} is accessed approximately n / 4 {\displaystyle n/4} times and so on. Consequently, if i ≫ log 2 ⁡ n {\displaystyle i\gg \log _{2}n} , then B I T M A P [ i ] {\displaystyle \mathrm {BITMAP} [i]} is almost certainly 0, and if i ≪ log 2 ⁡ n {\displaystyle i\ll \log _{2}n} , then B I T M A P [ i ] {\displaystyle \mathrm {BITMAP} [i]} is almost certainly 1. If i ≈ log 2 ⁡ n {\displaystyle i\approx \log _{2}n} , then B I T M A P [ i ] {\displaystyle \mathrm {BITMAP} [i]} can be expected to be either 1 or 0. The correction factor ϕ ≈ 0.77351 {\displaystyle \phi \approx 0.77351} (OEIS: A244256) is found by calculations, which can be found in the original article. == Improving accuracy == A problem with the Flajolet–Martin algorithm in the above form is that the results vary significantly. A common solution has been to run the algorithm multiple times with k {\displaystyle k} different hash functions and combine the results from the different runs. One idea is to take the mean of the k {\displaystyle k} results together from each hash function, obtaining a single estimate of the cardinality. The problem with this is that averaging is very susceptible to outliers (which are likely here). A different idea is to use the median, which is less prone to be influences by outliers. The problem with this is that the results can only take form 2 R / ϕ {\displaystyle 2^{R}/\phi } , where R {\displaystyle R} is integer. A common solution is to combine both the mean and the median: Create k ⋅ l {\displaystyle k\cdot l} hash functions and split them into k {\displaystyle k} distinct groups (each of size l {\displaystyle l} ). Within each group use the mean for aggregating together the l {\displaystyle l} results, and finally take the median of the k {\displaystyle k} group estimates as the final estimate. The 2007 HyperLogLog algorithm splits the multiset into subsets and estimates their cardinalities, then it uses the harmonic mean to combine them into an estimate for the original cardinality.

    Read more →
  • ISO 15926

    ISO 15926

    ISO 15926 is a standard for data integration, sharing, exchange, and hand-over between computer systems. The title, "Industrial automation systems and integration—Integration of life-cycle data for process plants including oil and gas production facilities", is regarded too narrow by the present ISO 15926 developers. Having developed a generic data model and reference data library for process plants, it turned out that this subject is already so wide, that actually any state information may be modelled with it. == History == In 1991 a European Union ESPRIT-, named ProcessBase, started. The focus of this research project was to develop a data model for lifecycle information of a facility that would suit the requirements of the process industries. At the time that the project duration had elapsed, a consortium of companies involved in the process industries had been established: EPISTLE (European Process Industries STEP Technical Liaison Executive). Initially individual companies were members, but later this changed into a situation where three national consortia were the only members: PISTEP (UK), POSC/Caesar (Norway), and USPI-NL (Netherlands). (later PISTEP merged into POSC/Caesar, and USPI-NL was renamed to USPI). EPISTLE took over the work of the ProcessBase project. Initially this work involved a standard called ISO 10303-221 (referred to as "STEP AP221"). In that AP221 we saw, for the first time, an Annex M with a list of standard instances of the AP221 data model, including types of objects. These standard instances would be for reference and would act as a knowledge base with knowledge about the types of objects. In the early nineties EPISTLE started an activity to extend Annex M to become a library of such object classes and their relationships: STEPlib. In the STEPlib activities a group of approx. 100 domain experts from all three member consortia, spread over the various expertises (e.g. Electrical, Piping, Rotating equipment, etc.), worked together to define the "core classes". The development of STEPlib was extended with many additional classes and relationships between classes and published as Open source data. Furthermore, the concepts and relation types from the AP221 and ISO 15926-2 data models were also added to the STEPlib dictionary. This resulted in the development of Gellish English, whereas STEPlib became the Gellish English dictionary. Gellish English is a structured subset of natural English and is a modeling language suitable for knowledge modeling, product modeling and data exchange. It differs from conventional modeling languages (meta languages) as used in information technology as it not only defines generic concepts, but also includes an English dictionary. The semantic expression capability of Gellish English was significantly increased by extending the number of relation types that can be used to express knowledge and information. For modelling-technical reasons POSC/Caesar proposed another standard than ISO 10303, called ISO 15926. EPISTLE (and ISO) supported that proposal, and continued the modelling work, thereby writing Part 2 of ISO 15926. This Part 2 has official ISO IS (International Standard) status since 2003. POSC/Caesar started to put together their own RDL (Reference Data Library). They added many specialized classes, for example for ANSI (American National Standards Institute) pipe and pipe fittings. Meanwhile, STEPlib continued its existence, mainly driven by some members of USPI. Since it was clear that it was not in the interest of the industry to have two libraries for, in essence, the same set of classes, the Management Board of EPISTLE decided that the core classes of the two libraries shall be merged into Part 4 of ISO 15926. This merging process has been finished. Part 4 should act as reference data for part 2 of ISO 15926 as well as for ISO 10303-221 and replaced its Annex M. On June 5, 2007 ISO 15926-4 was signed off as a TS (Technical Specification). In 1999 the work on an earlier version of Part 7 started. Initially this was based on XML Schema (the only useful W3C Recommendation available then), but when Web Ontology Language (OWL) became available it was clear that provided a far more suitable environment for Part 7. Part 7 passed the first ISO ballot by the end of 2005, and an implementation project started. A formal ballot for TS (Technical Specification) was planned for December 2007. However, it was decided then to split Part 7 into more than one part, because the scope was too wide. == Need for ISO15926 == In 2004, the National Institute of Standards and Technology (NIST) released a report on the impact of the lack of digital interoperability in the capital projects industry. The report estimated the cost of inadequate interoperability in the U.S. capital facilities industry to be $15.8 billion per year. This was considered likely to be a conservative figure. == The standard == ISO 15926 has thirteen parts (as of February 2022): Part 1 - Overview and fundamental principles Part 2 - Data model Part 3 - Reference data for geometry and topology Part 4 - Reference Data, the terms used within facilities for the process industry Part 6 - Methodology for the development and validation of reference data (under development) Part 7 - Template methodology Part 8 - OWL/RDF implementation Part 9 - Implementation standards, with the focus on standard web servers, web services, and security (under development) Part 10 - Conformance testing Part 11 - Methodology for simplified industrial usage of reference data (under development) Part 12 - Life cycle integration ontology in Web Ontology Language (OWL2) Part 13 - Integrated lifecycle asset planning === Description === The model and the library are suitable for representing lifecycle information about technical installations and their components. They can also be used for defining the terms used in product catalogs in e-commerce. Another, more limited, use of the standard is as a reference classification for harmonization purposes between shared databases and product catalogues that are not based on ISO 15926. The purpose of ISO 15926 is to provide a Lingua Franca for computer systems, thereby integrating the information produced by them. Although set up for the process industries with large projects involving many parties, and involving plant operations and maintenance lasting decades, the technology can be used by anyone willing to set up a proper vocabulary of reference data in line with Part 4. In Part 7 the concept of Templates is introduced. These are semantic constructs, using Part 2 entities, that represent a small piece of information. These constructs then are mapped to more efficient classes of n-ary relations that interlink the Nodes that are involved in the represented information. In Part 8 the Part 7 Templates are defined in OWL and instantiated in RDF. For validation and reasoning purposes all are represented in First-Order Logic as well. In Part 9 these Node and Template instances are stored in an RDF triple store, set up to a standard schema and an API. Each participating computer system maps its data from its internal format to such ISO-standard Node and Template instances. Data can be "handed over" from one triple store to another in cases where data custodianship is handed over (e.g. from a contractor to a plant owner, or from a manufacturer to the owners of the manufactured goods). Hand-over can be for a part of all data, whilst maintaining full referential integrity. Documents are user-definable. They are defined in XML Schema and they are, in essence, only a structure containing cells that make reference to instances of Templates. This represents a view on all lifecycle data: since the data model is a 4D (space-time) model, it is possible to present the data that was valid at any given point in time, thus providing a true historical record. It is expected that this will be used for Knowledge Mining. Data can be queried by means of SPARQL. In any implementation a restricted number of triple stores can be involved, with different access rights. This is done by means of creating a CPF Server (= Confederation of Participating Façades). An Ontology Browser allows for access to one or more triple stores in a given CPF, depending on the access rights. == Projects and applications == There are a number of projects working on the extension of the ISO 15926 standard in different application areas. === Capital-intensive projects === Within the application of Capital Intensive projects, some cooperating implementation projects are running: The DEXPI project: The objective of DEXPI is to develop and promote a general standard for the process industry covering all phases of the lifecycle of a (petro-)chemical plant, ranging from specification of functional requirements to assets in operation. Finalised projects include: The EDRC Project of FIATECH Capturing Equipment Data Requirements Using ISO 15926 and Assessing Conforma

    Read more →
  • Transparency in the software supply chain

    Transparency in the software supply chain

    Transparency in the software supply chain is a condition in which participants involved in the development, procurement, operation, auditing, or regulation of software can determine which components, dependencies, build stages, identifiers, and relationships within the supply chain make up the delivered product. The disclosure of information about software components, their interrelationships, origins, and development methods—for the purposes of risk management, vulnerability detection, and compliance—takes place throughout the software lifecycle. Transparency is one of the key security attributes of the software supply chain, as a deeper understanding of the chain enables participants to identify vulnerabilities and mitigate threats. Problems in the software supply chain can cause billions in losses and create operational challenges for government and commercial entities, as demonstrated by incidents involving SolarWinds, Bybit, 3CX, Jaguar Land Rover, GitHub, and NotPetya. Modern software is often assembled from third-party libraries and open-source components. According to research by the Linux Foundation and Synopsys, 96% of the commercial codebases analyzed contained open-source software, and 70–90% of a typical codebase may consist of open-source components. Without transparency, any software component can become a threat. As a result, companies may spend billions of dollars building robust external defenses, but this will not protect against vulnerabilities in legitimate software inside the perimeter. At the same time, supply chain attacks also erode trust between customers and their IT providers, as malicious code is often embedded in official updates with certificates and digital signatures. One of the primary ways to ensure transparency is through a software bill of materials, which documents the components used to create the software and the relationships within the supply chain. == Concept == The software supply chain is the collection of systems, devices, people, artifacts, and processes involved in the creation of the final software product. Attacks on the software supply chain differ from conventional attacks in that they follow a four-stage pattern: compromise, modification, distribution, and subsequent exploitation of the compromised or modified component. A defining feature of a supply chain attack is the introduction or manipulation of a change at an upstream stage, which is subsequently exploited at a downstream stage. Transparency refers to the availability of knowledge about the chain, while validity concerns the integrity of operations and artifacts and the authentication of participants, and separation involves reducing unnecessary trust relationships and the radius of impact through compartmentalization. In this framework, transparency primarily helps during the pre-compromise and detection phases, as a clearer understanding of participants, operations, and artifacts makes it easier to identify weak links before attackers exploit them. Current major attack vectors include dependencies and containers, build infrastructure, and human participants, such as maintainers or developers. == History == Software supply-chain transparency developed from earlier efforts to document software components, long before the term came into widespread use in the cybersecurity field. Early component-documentation formats included SPDX, first published in 2011, and CycloneDX, first published in 2017. Initially, these formats were created to support license compliance, package identification, and tool compatibility. Their development helped shape a broader concept of software supply chain transparency, encompassing component documentation, disclosure practices, risk management, security analysis, and regulatory compliance. In 2018, the U.S. National Telecommunications and Information Administration launched a multistakeholder process on promoting software component transparency. This process helped move work on SBOMs from a specialized technical practice into the realm of policy and procurement to identify components used in software products. The 2020 compromise of the SolarWinds Orion platform made software supply chain security a central issue in government cybersecurity policy. An analysis of the “Sunburst” campaign prepared by the Atlantic Council noted that the vulnerability of the software supply chain had become a realized risk for national-security agencies. In May 2021, U.S. President Joe Biden issued Executive Order 14028, which directed federal agencies to improve cybersecurity and increase transparency in the software supply chain, including requirements related to SBOMs. Reuters reported that the executive order required software developers selling their products to the federal government to provide greater visibility into their software and make security data available. In July 2021, the NTIA published the document “The Minimum Elements for a Software Bill of Materials (SBOM)”, defining the basic data fields and practices for creating SBOMs. Between 2021 and 2025, the U.S. Cybersecurity and Infrastructure Security Agency updated its guidance on “Framing Software Component Transparency”, expanding the set of SBOM attributes, metadata requirements, and operational recommendations for the creation, exchange, and use of SBOMs. Major incidents that occurred following the SolarWinds attack have underscored the importance of transparency in vulnerability management and supply chain security. The Log4Shell vulnerability in the Log4j library, disclosed in December 2021, demonstrated how difficult it can be for organizations to identify a vulnerable component deeply embedded within applications and services. In 2024, an attempt to plant a backdoor in XZ Utils showed how attackers could exploit trust in open-source maintenance processes to introduce malicious code into widely used infrastructure software. By the mid-2020s, software supply chain transparency had become part of international cybersecurity coordination and regulation. On September 3, 2025, Japan's Ministry of Economy, Trade and Industry and the National Cybersecurity Office, in collaboration with cybersecurity agencies from 15 countries, released the document “A Shared Vision of Software Bill of Materials (SBOM) for Cybersecurity.” In the European Union, the Cyber Resilience Act required manufacturers of products with digital elements to create, maintain, and retain SBOMs as part of the technical documentation for software placed on the EU market. == Transparency mechanisms == The primary mechanism for ensuring transparency is the software bill of materials (SBOM). An SBOM is a structured list of components, libraries, and tools used to build and distribute a software product, and it records dependencies in a way that helps organizations understand and assess their software supply chains. It can also be described as a formal record of components and their interdependencies, which gives users insight into their actual exposure to risks and threats. Five key areas of SBOM application in software supply chain security have been identified: vulnerability management, ensuring transparency, component evaluation, risk assessment, and ensuring supply chain integrity. In software supply chains, an SBOM documents all components, both open-source and proprietary. Under Executive Order 14028, U.S. federal agencies require software suppliers to provide SBOMs for government-procured software. The list of minimum required SBOM elements defined by NTIA includes three main categories: required data fields for describing each component (name, version, identifiers), automation support (machine-readable format, generation tools), and recommendations for creating SBOMs during development and purchasing. The post-2021 push for SBOMs was intended to provide visibility into the components used within software and to expose parts of an application that would otherwise remain hidden. This information can be used to prioritize patches, manage vulnerabilities, and support compliance work. Transparency also supports software traceability, which is becoming a standard feature of developer platforms. Traceability has become important because organizations are increasingly required to demonstrate how software was created, rather than simply listing its components. Higher levels of assurance require signed, tamper-proof traceability and more isolated, verifiable build environments. A related mechanism is build reproducibility. Reproducible builds are defined as build processes that make the compilation process deterministic, ensuring that the same source code always produces the same binary file. These builds are considered a foundational element for distributed verification, transparency-log maintenance, supply-chain workflow integration, and the creation of keyless signatures based on verifiable logs. Although reproducibility does not replace inventory or attestation, it gives external par

    Read more →
  • Encyclopaedistics

    Encyclopaedistics

    Encyclopaedistics or encyclopaedics as a discipline, is the academic scholarship of encyclopedias as sources of encyclopedic knowledge and cultural objects as well; in this sense, this discipline is also known as "encyclopaedia studies" and can be termed as "theoretical encyclopaediography" by analogy with theoretical lexicography. Encyclopaedistics as a practical activity (profession or business) also called "encyclopaedic practice" or "encyclopedism" is the process of assembling encyclopaedias available to the public for sale or for free (encyclopaedia publishing or practical encyclopediography). In this sense, it is the art or craft of writing, compiling, and editing the paper or online encyclopedias. As a practical activity, encyclopaedistics originated in the Middle Ages in connection with the development of compendiums based on alphabetical structuring (e.g. first edition of Polyanthea by Dominicus Nanus Mirabellius). Encyclopaedistics is often defined as "the art and science of selecting and disseminating the information most significant to mankind". == Field of study == Encyclopaedistics is a specialized aspect of information science and communication science. At the same time, encyclopaedistics is also considered as one of scholarly disciplines which are seen as auxiliary for historical research (auxiliary sciences of history) . Third, encyclopaedics is a domain of philosophy (Romanticism). This term associated with German philosophers of the 18th century, such as Novalis, Friedrich Schlegel, who sought to create a "Scientific Bible" - both real and ideal book as the quintessence of human education (enlightenment). In any case, the most popular topics in encyclopaedia studies refferd the history of organization of encyclopaedic knowledge, encyclopaedic knowledge determination and selection, glossary composition, current state of development of encyclopaedic activity, features of making encyclopaedias and encyclopaedic articles, usage, role and significance of encyclopaedias, typology of encyclopaedic literature, encyclopaedists and encyclopaedic schools, opposition of classical encyclopaedias and Wikipedia as well as paper encyclopaedias and online encyclopaedias, case experience in building encyclopedias etc. In general, scholarly studies contribute to appearance of successful well-crafted encyclopaedias with high-quality articles. == Contemporary encyclopaedic practice == Today, academic institutions, universities, and publishing companies worldwide are engaged in encyclopaedic activity building national, multinational (universal), regional and subject-specific encyclopaedias, or doing studies related encyclopaedias. The development of national encyclopaedias is one of the prerogatives of the European Parliament in the policy of protection of accurate and verified information and in the fight against mis- and disinformation as well as in the policy of protecting, promoting and projecting Europe's values and interests in the world.

    Read more →
  • Algorithm IMED

    Algorithm IMED

    In multi-armed bandit problems, IMED (for Indexed Minimum Empirical Divergence) is an algorithm developed in 2015 by Junya Honda and Akimichi Takemura. It is the first algorithm proved to be asymptotically optimal respect to the problem-dependant Lai–Robbins lower bound for distributions in ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} . == Multi-armed bandit problem == The Multi-armed bandit problem is a sequential game where one player has to choose at each turn between K {\displaystyle K} actions (arms). Behind every arm a {\displaystyle a} there is an unknown distribution ν a {\displaystyle \nu _{a}} that lies in a set D {\displaystyle {\mathcal {D}}} known by the player (for example, D {\displaystyle {\mathcal {D}}} can be the set of Gaussian distributions or Bernoulli distributions). At each turn t {\displaystyle t} the player chooses (pulls) an arm a t {\displaystyle a_{t}} , he then gets an observation X t {\displaystyle X_{t}} of the distribution ν a t {\displaystyle \nu _{a_{t}}} . === Regret minimization === The goal is to minimize the regret at time T {\displaystyle T} that is defined as R T := ∑ a = 1 K Δ a E [ N a ( T ) ] {\displaystyle R_{T}:=\sum _{a=1}^{K}\Delta _{a}\mathbb {E} [N_{a}(T)]} where μ a := E [ ν a ] {\displaystyle \mu _{a}:=\mathbb {E} [\nu _{a}]} is the mean of arm a {\displaystyle a} μ ∗ := max a μ a {\displaystyle \mu ^{}:=\max _{a}\mu _{a}} is the highest mean Δ a := μ ∗ − μ a {\displaystyle \Delta _{a}:=\mu ^{}-\mu _{a}} N a ( t ) {\displaystyle N_{a}(t)} is the number of pulls of arm a {\displaystyle a} up to turn t {\displaystyle t} The player has to find an algorithm that chooses at each turn t {\displaystyle t} which arm to pull based on the previous actions and observations ( a s , X s ) s < t {\displaystyle (a_{s},X_{s})_{s μ } {\displaystyle {\mathcal {K}}_{inf}(\nu ,\mu ,{\mathcal {D}}):=\inf \left\{\mathrm {KL} (\nu ,{\tilde {\nu }})\ |\ {\tilde {\nu }}\in {\mathcal {P}}([-\infty ,1]),\ \mathbb {E} [{\tilde {\nu }}]>\mu \right\}} K L {\displaystyle \mathrm {KL} } is the Kullback–Leibler divergence P ( [ − ∞ , 1 ] ) {\displaystyle {\mathcal {P}}([-\infty ,1])} is the set of distribution in [ − ∞ , 1 ] {\displaystyle [-\infty ,1]} ν ^ a ( t ) {\displaystyle {\hat {\nu }}_{a}(t)} is the empirical distribution of arm a {\displaystyle a} at turn t {\displaystyle t} μ ^ ∗ ( t ) {\displaystyle {\hat {\mu }}^{}(t)} is the highest empirical mean of turn t {\displaystyle t} Remark : For arms a {\displaystyle a} that verify μ ^ a ( t ) = μ ^ ∗ ( t ) {\displaystyle {\hat {\mu }}_{a}(t)={\hat {\mu }}^{}(t)} we have K i n f ( ν ^ a ( t ) , μ ^ ∗ ( t ) ) = 0 {\displaystyle K_{inf}({\hat {\nu }}_{a}(t),{\hat {\mu }}^{}(t))=0} . Then there index is equal to ln ⁡ ( N a ( t ) ) {\displaystyle \ln(N_{a}(t))} === Pseudocode === for each arm i do: n[i] ← 1; nu[i] ← None; mu[i] ← None for t from 1 to K do: select arm t observe reward r n[t] ← n[t] + 1 nu[t] ← update empirical distribution mu[t] ← update empirical mean for t from K+1 to T do: mu ← highest mu for each arm i do: scoreK[i] ← n[i] K_inf(nu[i],mu) scoreN[i] ← ln(n[i]) index[i] ← scoreK[i] + scoreN[i] select arm a with smallest index[a] observe reward r n[a] ← n[a] + 1 nu[a] ← update empirical distribution mu[a] ← update empirical mean == Theoretical results == In the multi-armed bandit problem we have the asymptotic Lai–Robbins lower bound asymptotic lower bound on regret. The algorithm IMED is the first algorithm that matches this lower bound for distribution in ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} in the first order. If the distribution are also bounded then it also match the second order. It is the first algorithm that match the second under of this lower bound. === Lai–Robbins lower bound === In 1985 Lai and Robbins proved an asymptotic, problem-dependent lower bound on regret. In 2018, Aurelien Garivier, Pierre Menard and Gilles Stoltz proved a refined lower bound that gives the second order It states that for every consistent algorithm on the set P ( [ − ∞ , 1 ] ) {\displaystyle {\mathcal {P}}([-\infty ,1])} — that is, an algorithm for which, for every ( ν 1 , … , ν K ) ∈ P ( [ − ∞ , 1 ] ) K {\displaystyle (\nu _{1},\dots ,\nu _{K})\in {\mathcal {P}}([-\infty ,1])^{K}} , the regret R T {\displaystyle R_{T}} is subpolynomial (i.e. R T = o T → + ∞ ( T α ) {\displaystyle R_{T}=o_{T\to +\infty }(T^{\alpha })} for all α > 0 {\displaystyle \alpha >0} ) — we have: R T ≥ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T − Ω T → + ∞ ( ln ⁡ ln ⁡ T ) . {\displaystyle R_{T}\geq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T-\Omega _{T\to +\infty }(\ln \ln T).} This bound is asymptotic (as T → + ∞ {\displaystyle T\to +\infty } ) and gives a first-order lower bound of order ln ⁡ T {\displaystyle \ln T} with the optimal constant in front of it and the second order in − Ω ( ln ⁡ ln ⁡ T ) {\displaystyle -\Omega (\ln \ln T)} . === Regret bound for IMED === If the distribution of every arm a {\displaystyle a} is ( − ∞ , 1 ] {\displaystyle (-\infty ,1]} ( i.e. ν a ∈ P ( [ − ∞ , 1 ] ) ) {\displaystyle \nu _{a}\in {\mathcal {P}}([-\infty ,1]))} then the regret of the algorithm IMED verify R T ≤ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T + O ( 1 ) {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T+O(1)} If all the distribution ν a {\displaystyle \nu _{a}} are bounded then it exists a constant C > 0 {\displaystyle C>0} such that for T {\displaystyle T} large enough the regret of IMED is upper bounded by R T ≤ ( ∑ a : μ a < μ ∗ Δ a K inf ( ν a , μ ∗ ) ) ln ⁡ T − C ln ⁡ ln ⁡ T {\displaystyle R_{T}\leq \left(\sum _{a:\mu _{a}<\mu ^{}}{\frac {\Delta _{a}}{{\mathcal {K}}_{\inf }(\nu _{a},\mu ^{})}}\right)\ln T-C\ln \ln T} == Computation time == The algorithm only requiere to compute the K i n f {\displaystyle K_{inf}} for suboptimal arms who are pulled O ( ln ⁡ T ) {\displaystyle O(\ln T)} times, which make it a lot faster than KL-UCB. A faster version of IMED was developed in 2023 to make it even faster, using a Taylor development of the K i n f {\displaystyle K_{inf}} in the first order .

    Read more →
  • Xulvi-Brunet–Sokolov algorithm

    Xulvi-Brunet–Sokolov algorithm

    Xulvi-Brunet and Sokolov's algorithm generates networks with chosen degree correlations. This method is based on link rewiring, in which the desired degree is governed by parameter ρ. By varying this single parameter it is possible to generate networks from random (when ρ = 0) to perfectly assortative or disassortative (when ρ = 1). This algorithm allows to keep network's degree distribution unchanged when changing the value of ρ. == Assortative model == In assortative networks, well-connected nodes are likely to be connected to other highly connected nodes. Social networks are examples of assortative networks. This means that an assortative network has the property that almost all nodes with the same degree are linked only between themselves. The Xulvi-Brunet–Sokolov algorithm for this type of networks is the following. In a given network, two links connecting four different nodes are chosen randomly. These nodes are ordered by their degrees. Then, with probability ρ, the links are randomly rewired in such a way that one link connects the two nodes with the smaller degrees and the other connects the two nodes with the larger degrees. If one or both of these links already existed in the network, the step is discarded and is repeated again. Thus, there will be no self-connected nodes or multiple links connecting the same two nodes. Different degrees of assortativity of a network can be achieved by changing the parameter ρ. Assortative networks are characterized by highly connected groups of nodes with similar degree. As assortativity grows, the average path length and clustering coefficient increase. == Disassortative model == In disassortative networks, highly connected nodes tend to connect to less-well-connected nodes with larger probability than in uncorrelated networks. Examples of such networks include biological networks. The Xulvi-Brunet and Sokolov's algorithm for this type of networks is similar to the one for assortative networks with one minor change. As before, two links of four nodes are randomly chosen and the nodes are ordered with respect to their degrees. However, in this case, the links are rewired (with probability p) such that one link connects the highest connected node with the node with the lowest degree and the other link connects the two remaining nodes randomly with probability 1 − ρ. Similarly, if the new links already existed, the previous step is repeated. This algorithm does not change the degree of nodes and thus the degree distribution of the network.

    Read more →
  • Computer Dreams

    Computer Dreams

    Computer Dreams is a 1988 film created by Digital Vision Entertainment and released by MPI Home Video. Written, produced and directed by Geoffrey de Valois and hosted by Amanda Pays, it consists primarily of clips and behind-the-scenes work of early computer graphics animation. Notably included are Luxo Jr. and Red's Dream, the first two short films from Pixar. The film is an hour long and features an electronic score by Music Fantastic. It was revised and re-released on DVD as The History of Computer Animation, Volume 2. It won the Winner Gold Special Jury Award at the 1989 Houston International Film Festival, and the 1989 Golden Decade Award from the US Film & Video Festival. Music used includes: Gail Lennon - Desire, Gail Lennon - Like A Dream, Shandi Sinnamon - Making It,

    Read more →
  • Whitehead's algorithm

    Whitehead's algorithm

    Whitehead's algorithm is a mathematical algorithm in group theory for solving the automorphic equivalence problem in the finite rank free group Fn. The algorithm is based on a classic 1936 paper of J. H. C. Whitehead. It is still unknown (except for the case n = 2) if Whitehead's algorithm has polynomial time complexity. == Statement of the problem == Let F n = F ( x 1 , … , x n ) {\displaystyle F_{n}=F(x_{1},\dots ,x_{n})} be a free group of rank n ≥ 2 {\displaystyle n\geq 2} with a free basis X = { x 1 , … , x n } {\displaystyle X=\{x_{1},\dots ,x_{n}\}} . The automorphism problem, or the automorphic equivalence problem for F n {\displaystyle F_{n}} asks, given two freely reduced words w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} whether there exists an automorphism φ ∈ Aut ⁡ ( F n ) {\displaystyle \varphi \in \operatorname {Aut} (F_{n})} such that φ ( w ) = w ′ {\displaystyle \varphi (w)=w'} . Thus the automorphism problem asks, for w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} whether Aut ⁡ ( F n ) w = Aut ⁡ ( F n ) w ′ {\displaystyle \operatorname {Aut} (F_{n})w=\operatorname {Aut} (F_{n})w'} . For w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} one has Aut ⁡ ( F n ) w = Aut ⁡ ( F n ) w ′ {\displaystyle \operatorname {Aut} (F_{n})w=\operatorname {Aut} (F_{n})w'} if and only if Out ⁡ ( F n ) [ w ] = Out ⁡ ( F n ) [ w ′ ] {\displaystyle \operatorname {Out} (F_{n})[w]=\operatorname {Out} (F_{n})[w']} , where [ w ] , [ w ′ ] {\displaystyle [w],[w']} are conjugacy classes in F n {\displaystyle F_{n}} of w , w ′ {\displaystyle w,w'} accordingly. Therefore, the automorphism problem for F n {\displaystyle F_{n}} is often formulated in terms of Out ⁡ ( F n ) {\displaystyle \operatorname {Out} (F_{n})} -equivalence of conjugacy classes of elements of F n {\displaystyle F_{n}} . For an element w ∈ F n {\displaystyle w\in F_{n}} , | w | X {\displaystyle |w|_{X}} denotes the freely reduced length of w {\displaystyle w} with respect to X {\displaystyle X} , and ‖ w ‖ X {\displaystyle \|w\|_{X}} denotes the cyclically reduced length of w {\displaystyle w} with respect to X {\displaystyle X} . For the automorphism problem, the length of an input w {\displaystyle w} is measured as | w | X {\displaystyle |w|_{X}} or as ‖ w ‖ X {\displaystyle \|w\|_{X}} , depending on whether one views w {\displaystyle w} as an element of F n {\displaystyle F_{n}} or as defining the corresponding conjugacy class [ w ] {\displaystyle [w]} in F n {\displaystyle F_{n}} . == History == The automorphism problem for F n {\displaystyle F_{n}} was algorithmically solved by J. H. C. Whitehead in a classic 1936 paper, and his solution came to be known as Whitehead's algorithm. Whitehead used a topological approach in his paper. Namely, consider the 3-manifold M n = # i = 1 n S 2 × S 1 {\displaystyle M_{n}=\#_{i=1}^{n}\mathbb {S} ^{2}\times \mathbb {S} ^{1}} , the connected sum of n {\displaystyle n} copies of S 2 × S 1 {\displaystyle \mathbb {S} ^{2}\times \mathbb {S} ^{1}} . Then π 1 ( M n ) ≅ F n {\displaystyle \pi _{1}(M_{n})\cong F_{n}} , and, moreover, up to a quotient by a finite normal subgroup isomorphic to Z 2 n {\displaystyle \mathbb {Z} _{2}^{n}} , the mapping class group of M n {\displaystyle M_{n}} is equal to Out ⁡ ( F n ) {\displaystyle \operatorname {Out} (F_{n})} ; see. Different free bases of F n {\displaystyle F_{n}} can be represented by isotopy classes of "sphere systems" in M n {\displaystyle M_{n}} , and the cyclically reduced form of an element w ∈ F n {\displaystyle w\in F_{n}} , as well as the Whitehead graph of [ w ] {\displaystyle [w]} , can be "read-off" from how a loop in general position representing [ w ] {\displaystyle [w]} intersects the spheres in the system. Whitehead moves can be represented by certain kinds of topological "swapping" moves modifying the sphere system. Subsequently, Rapaport, and later, based on her work, Higgins and Lyndon, gave a purely combinatorial and algebraic re-interpretation of Whitehead's work and of Whitehead's algorithm. The exposition of Whitehead's algorithm in the book of Lyndon and Schupp is based on this combinatorial approach. Culler and Vogtmann, in their 1986 paper that introduced the Outer space, gave a hybrid approach to Whitehead's algorithm, presented in combinatorial terms but closely following Whitehead's original ideas. == Whitehead's algorithm == Our exposition regarding Whitehead's algorithm mostly follows Ch.I.4 in the book of Lyndon and Schupp, as well as. === Overview === The automorphism group Aut ⁡ ( F n ) {\displaystyle \operatorname {Aut} (F_{n})} has a particularly useful finite generating set W {\displaystyle {\mathcal {W}}} of Whitehead automorphisms or Whitehead moves. Given w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} the first part of Whitehead's algorithm consists of iteratively applying Whitehead moves to w , w ′ {\displaystyle w,w'} to take each of them to an "automorphically minimal" form, where the cyclically reduced length strictly decreases at each step. Once we find automorphically these minimal forms u , u ′ {\displaystyle u,u'} of w , w ′ {\displaystyle w,w'} , we check if ‖ u ‖ X = ‖ u ′ ‖ X {\displaystyle \|u\|_{X}=\|u'\|_{X}} . If ‖ u ‖ X ≠ ‖ u ′ ‖ X {\displaystyle \|u\|_{X}\neq \|u'\|_{X}} then w , w ′ {\displaystyle w,w'} are not automorphically equivalent in F n {\displaystyle F_{n}} . If ‖ u ‖ X = ‖ u ′ ‖ X {\displaystyle \|u\|_{X}=\|u'\|_{X}} , we check if there exists a finite chain of Whitehead moves taking u {\displaystyle u} to u ′ {\displaystyle u'} so that the cyclically reduced length remains constant throughout this chain. The elements w , w ′ {\displaystyle w,w'} are not automorphically equivalent in F n {\displaystyle F_{n}} if and only if such a chain exists. Whitehead's algorithm also solves the search automorphism problem for F n {\displaystyle F_{n}} . Namely, given w , w ′ ∈ F n {\displaystyle w,w'\in F_{n}} , if Whitehead's algorithm concludes that Aut ⁡ ( F n ) w = Aut ⁡ ( F n ) w ′ {\displaystyle \operatorname {Aut} (F_{n})w=\operatorname {Aut} (F_{n})w'} , the algorithm also outputs an automorphism φ ∈ Aut ⁡ ( F n ) {\displaystyle \varphi \in \operatorname {Aut} (F_{n})} such that φ ( w ) = w ′ {\displaystyle \varphi (w)=w'} . Such an element φ ∈ Aut ⁡ ( F n ) {\displaystyle \varphi \in \operatorname {Aut} (F_{n})} is produced as the composition of a chain of Whitehead moves arising from the above procedure and taking w {\displaystyle w} to w ′ {\displaystyle w'} . === Whitehead automorphisms === A Whitehead automorphism, or Whitehead move, of F n {\displaystyle F_{n}} is an automorphism τ ∈ Aut ⁡ ( F n ) {\displaystyle \tau \in \operatorname {Aut} (F_{n})} of F n {\displaystyle F_{n}} of one of the following two types: There is a permutation σ ∈ S n {\displaystyle \sigma \in S_{n}} of { 1 , 2 , … , n } {\displaystyle \{1,2,\dots ,n\}} such that for i = 1 , … , n {\displaystyle i=1,\dots ,n} τ ( x i ) = x σ ( i ) ± 1 {\displaystyle \tau (x_{i})=x_{\sigma (i)}^{\pm 1}} Such τ {\displaystyle \tau } is called a Whitehead automorphism of the first kind. There is an element a ∈ X ± 1 {\displaystyle a\in X^{\pm 1}} , called the multiplier, such that for every x ∈ X ± 1 {\displaystyle x\in X^{\pm 1}} τ ( x ) ∈ { x , x a , a − 1 x , a − 1 x a } . {\displaystyle \tau (x)\in \{x,xa,a^{-1}x,a^{-1}xa\}.} Such τ {\displaystyle \tau } is called a Whitehead automorphism of the second kind. Since τ {\displaystyle \tau } is an automorphism of F n {\displaystyle F_{n}} , it follows that τ ( a ) = a {\displaystyle \tau (a)=a} in this case. Often, for a Whitehead automorphism τ ∈ Aut ⁡ ( F n ) {\displaystyle \tau \in \operatorname {Aut} (F_{n})} , the corresponding outer automorphism in Out ⁡ ( F n ) {\displaystyle \operatorname {Out} (F_{n})} is also called a Whitehead automorphism or a Whitehead move. ==== Examples ==== Let F 4 = F ( x 1 , x 2 , x 3 , x 4 ) {\displaystyle F_{4}=F(x_{1},x_{2},x_{3},x_{4})} . Let τ : F 4 → F 4 {\displaystyle \tau :F_{4}\to F_{4}} be a homomorphism such that τ ( x 1 ) = x 2 x 1 , τ ( x 2 ) = x 2 , τ ( x 3 ) = x 2 x 3 x 2 − 1 , τ ( x 4 ) = x 4 {\displaystyle \tau (x_{1})=x_{2}x_{1},\quad \tau (x_{2})=x_{2},\quad \tau (x_{3})=x_{2}x_{3}x_{2}^{-1},\quad \tau (x_{4})=x_{4}} Then τ {\displaystyle \tau } is actually an automorphism of F 4 {\displaystyle F_{4}} , and, moreover, τ {\displaystyle \tau } is a Whitehead automorphism of the second kind, with the multiplier a = x 2 − 1 {\displaystyle a=x_{2}^{-1}} . Let τ ′ : F 4 → F 4 {\displaystyle \tau ':F_{4}\to F_{4}} be a homomorphism such that τ ′ ( x 1 ) = x 1 , τ ′ ( x 2 ) = x 1 − 1 x 2 x 1 , τ ′ ( x 3 ) = x 1 − 1 x 3 x 1 , τ ′ ( x 4 ) = x 1 − 1 x 4 x 1 {\displaystyle \tau '(x_{1})=x_{1},\quad \tau '(x_{2})=x_{1}^{-1}x_{2}x_{1},\quad \tau '(x_{3})=x_{1}^{-1}x_{3}x_{1},\quad \tau '(x_{4})=x_{1}^{-1}x_{4}x_{1}} Then τ ′ {\displaystyle \tau '} is actually an inner automorphism of F 4 {\displaystyle F_{4}} given by conjugation by x 1 {\displaystyle x_{1}} , and, moreover, τ ′ {\displaystyle \

    Read more →
  • Kleene's algorithm

    Kleene's algorithm

    In theoretical computer science, in particular in formal language theory, Kleene's algorithm transforms a given nondeterministic finite automaton (NFA) into a regular expression. Together with other conversion algorithms, it establishes the equivalence of several description formats for regular languages. Alternative presentations of the same method include the "elimination method" attributed to Brzozowski and McCluskey, the algorithm of McNaughton and Yamada, and the use of Arden's lemma. == Algorithm description == According to Gross and Yellen (2004), the algorithm can be traced back to Kleene (1956). A presentation of the algorithm in the case of deterministic finite automata (DFAs) is given in Hopcroft and Ullman (1979). The presentation of the algorithm for NFAs below follows Gross and Yellen (2004). Given a nondeterministic finite automaton M = (Q, Σ, δ, q0, F), with Q = { q0,...,qn } its set of states, the algorithm computes the sets Rkij of all strings that take M from state qi to qj without going through any state numbered higher than k. Here, "going through a state" means entering and leaving it, so both i and j may be higher than k, but no intermediate state may. Each set Rkij is represented by a regular expression; the algorithm computes them step by step for k = -1, 0, ..., n. Since there is no state numbered higher than n, the regular expression Rn0j represents the set of all strings that take M from its start state q0 to qj. If F = { q1,...,qf } is the set of accept states, the regular expression Rn01 | ... | Rn0f represents the language accepted by M. The initial regular expressions, for k = -1, are computed as follows for i≠j: R−1ij = a1 | ... | am where qj ∈ δ(qi,a1), ..., qj ∈ δ(qi,am) and as follows for i=j: R−1ii = a1 | ... | am | ε where qi ∈ δ(qi,a1), ..., qi ∈ δ(qi,am) In other words, R−1ij mentions all letters that label a transition from i to j, and we also include ε in the case where i=j. After that, in each step the expressions Rkij are computed from the previous ones by Rkij = Rk-1ik (Rk-1kk) Rk-1kj | Rk-1ij Another way to understand the operation of the algorithm is as an "elimination method", where the states from 0 to n are successively removed: when state k is removed, the regular expression Rk-1ij, which describes the words that label a path from state i>k to state j>k, is rewritten into Rkij so as to take into account the possibility of going via the "eliminated" state k. By induction on k, it can be shown that the length of each expression Rkij is at most ⁠1/3⁠(4k+1(6s+7) - 4) symbols, where s denotes the number of characters in Σ. Therefore, the length of the regular expression representing the language accepted by M is at most ⁠1/3⁠(4n+1(6s+7)f - f - 3) symbols, where f denotes the number of final states. This exponential blowup is inevitable, because there exist families of DFAs for which any equivalent regular expression must be of exponential size. In practice, the size of the regular expression obtained by running the algorithm can be very different depending on the order in which the states are considered by the procedure, i.e., the order in which they are numbered from 0 to n. == Example == The automaton shown in the picture can be described as M = (Q, Σ, δ, q0, F) with the set of states Q = { q0, q1, q2 }, the input alphabet Σ = { a, b }, the transition function δ with δ(q0,a)=q0, δ(q0,b)=q1, δ(q1,a)=q2, δ(q1,b)=q1, δ(q2,a)=q1, and δ(q2,b)=q1, the start state q0, and set of accept states F = { q1 }. Kleene's algorithm computes the initial regular expressions as After that, the Rkij are computed from the Rk-1ij step by step for k = 0, 1, 2. Kleene algebra equalities are used to simplify the regular expressions as much as possible. Step 0 Step 1 Step 2 Since q0 is the start state and q1 is the only accept state, the regular expression R201 denotes the set of all strings accepted by the automaton.

    Read more →