AI Avatar Generator Online Free

AI Avatar Generator Online Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Roposo

    Roposo

    Roposo is an Indian video-sharing social media service, owned by Glance, a subsidiary of InMobi. Roposo provides a space where users can share posts related to different topics like food, comedy, music, poetry, fashion and travel. It is a platform where people express visually with homemade videos and photos. The app offers a TV-like browsing experience with user-generated content on its channels. Users can also use editing tools on the platform and upload their content. == History == Established in July 2014 under Relevant E-solutions Pvt. Ltd., Roposo is the brainchild of three IIT Delhi alumni – Mayank Bhangadia, Avinash Saxena, and Kaushal Shubhank. Under Bhangadia's leadership, the company pivoted from a fashion-based network into a short-form video platform with AI-powered moderation, and its journey was featured as a Harvard Business Publishing case study. In November 2019, Roposo was acquired by InMobi's Glance Digital Experience Pvt. Ltd.(the mobile content platform and part of the InMobi Group). When the Chinese-owned video-sharing app TikTok was banned on 30 June 2020, the app saw a huge spike in users with several TikTok users registering on Roposo. == Technology == The open platform has some features such as a TV-like browsing, different channels, a chat feature that lets buyers and sellers converse directly through the platform, and creation tools such as an option to add voice-over, music and GIF stickers for videos and photos.

    Read more →
  • GNU Octave

    GNU Octave

    GNU Octave is a scientific programming language for scientific computing and numerical computation. Among other things, Octave can be used to solve linear and nonlinear problems numerically and to perform other numerical experiments using a language that is mostly compatible with MATLAB. It may also be used as a batch-oriented language. As part of the GNU Project, it is free software under the terms of the GNU General Public License. == History == The project was conceived around 1988. At first it was intended to be a companion to a chemical reactor design course. Full development was started by John W. Eaton in 1992. The first alpha release dates back to 4 January 1993 and on 17 February 1994 version 1.0 was released. Version 9.2.0 was released on 7 June 2024. The program is named after Octave Levenspiel, a former professor of the principal author. Levenspiel was known for his ability to perform quick back-of-the-envelope calculations. == Development history == == Developments == In addition to use on desktops for personal scientific computing, Octave is used in academia and industry. For example, Octave was used on a massive parallel computer at Pittsburgh Supercomputing Center to find vulnerabilities related to guessing social security numbers. Acceleration with OpenCL or CUDA is also possible with use of GPUs. == Technical details == Octave is written in C++ using the C++ standard library. Octave uses an interpreter to execute the Octave scripting language. Octave is extensible using dynamically loadable modules. Octave interpreter has an OpenGL-based graphics engine to create plots, graphs and charts and to save or print them. Alternatively, gnuplot can be used for the same purpose. Octave includes a graphical user interface (GUI) in addition to the traditional command-line interface (CLI); see #User interfaces for details. == Octave, the language == The Octave language is an interpreted programming language. It is a structured programming language (similar to C) and supports many common C standard library functions, and also certain UNIX system calls and functions. However, it does not support passing arguments by reference although function arguments are copy-on-write to avoid unnecessary duplication. Octave programs consist of a list of function calls or a script. The syntax is matrix-based and provides various functions for matrix operations. It supports various data structures and allows object-oriented programming. Its syntax is very similar to MATLAB, and careful programming of a script will allow it to run on both Octave and MATLAB. Because Octave is made available under the GNU General Public License, it may be freely changed, copied and used. The program runs on Microsoft Windows and most Unix and Unix-like operating systems, including Linux, Android, and macOS. == Notable features == === Command and variable name completion === Typing a TAB character on the command line causes Octave to attempt to complete variable, function, and file names (similar to Bash's tab completion). Octave uses the text before the cursor as the initial portion of the name to complete. === Command history === When running interactively, Octave saves the commands typed in an internal buffer so that they can be recalled and edited. === Data structures === Octave includes a limited amount of support for organizing data in structures. In this example, we see a structure x with elements a, b, and c, (an integer, an array, and a string, respectively): === Short-circuit Boolean operators === Octave's && and || logical operators are evaluated in a short-circuit fashion (like the corresponding operators in the C language), in contrast to the element-by-element operators & and |. === Increment and decrement operators === Octave includes the C-like increment and decrement operators ++ and -- in both their prefix and postfix forms. Octave also does augmented assignment, e.g. x += 5. === Unwind-protect === Octave supports a limited form of exception handling modelled after the unwind_protect of Lisp. The general form of an unwind_protect block looks like this: As a general rule, GNU Octave recognizes as termination of a given block either the keyword end (which is compatible with the MATLAB language) or a more specific keyword endblock or, in some cases, end_block. As a consequence, an unwind_protect block can be terminated either with the keyword end_unwind_protect as in the example, or with the more portable keyword end. The cleanup part of the block is always executed. In case an exception is raised by the body part, cleanup is executed immediately before propagating the exception outside the block unwind_protect. GNU Octave also supports another form of exception handling (compatible with the MATLAB language): This latter form differs from an unwind_protect block in two ways. First, exception_handling is only executed when an exception is raised by body. Second, after the execution of exception_handling the exception is not propagated outside the block (unless a rethrow( lasterror ) statement is explicitly inserted within the exception_handling code). === Variable-length argument lists === Octave has a mechanism for handling functions that take an unspecified number of arguments without explicit upper limit. To specify a list of zero or more arguments, use the special argument varargin as the last (or only) argument in the list. varargin is a cell array containing all the input arguments. === Variable-length return lists === A function can be set up to return any number of values by using the special return value varargout. For example: === C++ integration === It is also possible to execute Octave code directly in a C++ program. For example, here is a code snippet for calling rand([10,1]): C and C++ code can be integrated into GNU Octave by creating oct files, or using the MATLAB compatible MEX files. == MATLAB compatibility == Octave has been built with MATLAB compatibility in mind, and shares many features with MATLAB: % Script: myscript.m a = 5; b = a 2 % Function: myfunc.m function result = myfunc(x) result = x^2 + 3; end Matrices as fundamental data type. Built-in support for complex numbers. Powerful built-in math functions and extensive function libraries. Extensibility in the form of user-defined functions. Octave treats incompatibility with MATLAB as a bug; therefore, it could be considered a software clone, which does not infringe software copyright as per Lotus v. Borland court case. MATLAB scripts from the MathWorks' FileExchange repository in principle are compatible with Octave. However, while they are often provided and uploaded by users under an Octave compatible and proper open source BSD license, the FileExchange Terms of use prohibit any usage beside MathWorks' proprietary MATLAB. === Syntax compatibility === There are a few purposeful, albeit minor, syntax additions Archived 2012-04-26 at the Wayback Machine: Comment lines can be prefixed with the # character as well as the % character; Various C-based operators ++, --, +=, =, /= are supported; Elements can be referenced without creating a new variable by cascaded indexing, e.g. [1:10](3); Strings can be defined with the double-quote " character as well as the single-quote ' character; When the variable type is single (a single-precision floating-point number), Octave calculates the "mean" in the single-domain (MATLAB in double-domain) which is faster but gives less accurate results; Blocks can also be terminated with more specific Control structure keywords, i.e., endif, endfor, endwhile, etc.; Functions can be defined within scripts and at the Octave prompt; Presence of a do-until loop (similar to do-while in C). === Function compatibility === Many, but not all, of the numerous MATLAB functions are available in GNU Octave, some of them accessible through packages in Octave Forge. The functions available as part of either core Octave or Forge packages are listed online Archived 2024-03-14 at the Wayback Machine. A list of unavailable functions is included in the Octave function __unimplemented.m__. Unimplemented functions are also listed under many Octave Forge packages in the Octave Wiki. When an unimplemented function is called the following error message is shown: == User interfaces == Octave comes with an official graphical user interface (GUI) and an integrated development environment (IDE) based on Qt. It has been available since Octave 3.8, and has become the default interface (over the command-line interface) with the release of Octave 4.0. It was well-received by an EDN contributor, who wrote "[Octave] now has a very workable GUI" in reviewing the then-new GUI in 2014. Several 3rd-party graphical front-ends have also been developed, like ToolboX for coding education. == GUI applications == With Octave code, the user can create GUI applications. See GUI Development (GNU Octave (version 7.1.0)). Below are some examples: Button, edit control, checkboxTextboxListbox wit

    Read more →
  • Witness set

    Witness set

    In combinatorics and computational learning theory, a witness set is a set of elements that distinguishes a given Boolean function from a given class of other Boolean functions. Let C {\displaystyle C} be a concept class over a domain X {\displaystyle X} (that is, a family of Boolean functions over X {\displaystyle X} ) and c {\displaystyle c} be a concept in X {\displaystyle X} (a single Boolean function). A subset S {\displaystyle S} of X {\displaystyle X} is a witness set for c {\displaystyle c} in X {\displaystyle X} if S {\displaystyle S} distinguishes c {\displaystyle c} from all the other functions in C {\displaystyle C} , in the sense that no other function in C {\displaystyle C} has the same values on S {\displaystyle S} . For a concept class with | C | {\displaystyle |C|} concepts, there exists a concept that has a witness of size at most log 2 ⁡ | C | {\displaystyle \log _{2}|C|} ; this bound is tight when C {\displaystyle C} consists of all Boolean functions over X {\displaystyle X} . By a result of Bondy (1972) there exists a single witness set of size at most | C | − 1 {\displaystyle |C|-1} that is valid for all concepts in C {\displaystyle C} ; this bound is tight when C {\displaystyle C} consists of the indicator functions of the empty set and some singleton sets. One way to construct this set is to interpret the concepts as bitstrings, and the domain elements as positions in these bitstrings. Then the set of positions at which a trie of the bitstrings branches forms the desired witness set. This construction is central to the operation of the fusion tree data structure. The minimum size of a witness set for c {\displaystyle c} is called the witness size or specification number and is denoted by w C ( c ) {\displaystyle w_{C}(c)} . The value max { w C ( c ) : c ∈ C } {\displaystyle \max\{w_{C}(c):c\in C\}} is called the teaching dimension of C {\displaystyle C} . It represents the number of examples of a concept that need to be presented by a teacher to a learner, in the worst case, to enable the learner to determine which concept is being presented. Witness sets have also been called teaching sets, keys, specifying sets, or discriminants. The "witness set" terminology is from Kushilevitz et al. (1996), who trace the concept of witness sets to work by Cover (1965).

    Read more →
  • Rectified linear unit

    Rectified linear unit

    In the context of artificial neural networks, the rectifier or ReLU (rectified linear unit) activation function is an activation function defined as the non-negative part of its argument, i.e., the ramp function: ReLU ⁡ ( x ) = x + = max ( 0 , x ) = x + | x | 2 = { x if x > 0 , 0 x ≤ 0 {\displaystyle \operatorname {ReLU} (x)=x^{+}=\max(0,x)={\frac {x+|x|}{2}}={\begin{cases}x&{\text{if }}x>0,\\0&x\leq 0\end{cases}}} where x {\displaystyle x} is the input to a neuron. This is analogous to half-wave rectification in electrical engineering. ReLU is one of the most popular activation functions for artificial neural networks, and finds application in computer vision and speech recognition using deep neural nets and computational neuroscience. == History == The ReLU was first used by Alston Householder in 1941 as a mathematical abstraction of biological neural networks. Kunihiko Fukushima in 1969 used ReLU in the context of visual feature extraction in hierarchical neural networks. In 1998, Gregory Woodbury demonstrated that the rectified linear function could account for a broad range of emergent properties in the visual cortex. His work showed that a single unified model could drive the joint development of refined retinotopic maps, ocular dominance columns, and orientation selectivity. By utilizing the rectifier's "cutoff" property, Woodbury achieved a close quantitative fit to biological data, matching the spatial periodicities and topographic refinement patterns observed in macaque and cat cortical maps. Furthermore, he extended this framework to adult plasticity, accurately replicating the spatial and temporal dynamics of lesion-induced cortical reorganization. This research established that the rectified linear response was a necessary mechanism for the stable self-organisation and maintenance of complex, multi-feature neural maps. In 2000, Hahnloser et al. argued that ReLU approximates the biological relationship between neural firing rates and input current, in addition to enabling recurrent neural network dynamics to stabilise under weaker criteria. Prior to 2010, most activation functions used were the logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more numerically efficient counterpart, the hyperbolic tangent. Around 2010, the use of ReLU became common again. Jarrett et al. (2009) noted that rectification by either absolute or ReLU (which they called "positive part") was critical for object recognition in convolutional neural networks (CNNs), specifically because it allows average pooling without neighboring filter outputs cancelling each other out. They hypothesized that the use of sigmoid or tanh was responsible for poor performance in previous CNNs. Nair and Hinton (2010) made a theoretical argument that the softplus activation function should be used, in that the softplus function numerically approximates the sum of an exponential number of linear models that share parameters. They then proposed ReLU as a good approximation to it. Specifically, they began by considering a single binary neuron in a Boltzmann machine that takes x {\displaystyle x} as input, and produces 1 as output with probability σ ( x ) = 1 1 + e − x {\displaystyle \sigma (x)={\frac {1}{1+e^{-x}}}} . They then considered extending its range of output by making infinitely many copies of it X 1 , X 2 , X 3 , … {\displaystyle X_{1},X_{2},X_{3},\dots } , that all take the same input, offset by an amount 0.5 , 1.5 , 2.5 , … {\displaystyle 0.5,1.5,2.5,\dots } , then their outputs are added together as ∑ i = 1 ∞ X i {\displaystyle \sum _{i=1}^{\infty }X_{i}} . They then demonstrated that ∑ i = 1 ∞ X i {\displaystyle \sum _{i=1}^{\infty }X_{i}} is approximately equal to N ( log ⁡ ( 1 + e x ) , σ ( x ) ) {\displaystyle {\mathcal {N}}(\log(1+e^{x}),\sigma (x))} , which is also approximately equal to ReLU ⁡ ( N ( x , σ ( x ) ) ) {\displaystyle \operatorname {ReLU} ({\mathcal {N}}(x,\sigma (x)))} , where N {\displaystyle {\mathcal {N}}} stands for the gaussian distribution. They also argued for another reason for using ReLU: that it allows "intensity equivariance" in image recognition. That is, multiplying input image by a constant k {\displaystyle k} multiplies the output also. In contrast, this is false for other activation functions like sigmoid or tanh. They found that ReLU activation allowed good empirical performance in restricted Boltzmann machines. Glorot et al (2011) argued that ReLU has the following advantages over sigmoid or tanh: ReLU is more similar to biological neurons' responses in their main operating regime. ReLU avoids vanishing gradients. ReLU is cheaper to compute. ReLU creates sparse representation naturally, because many hidden units output exactly zero for a given input. They also found empirically that deep networks trained with ReLU can achieve strong performance without unsupervised pre-training, especially on large, purely supervised tasks. In 2017, the rectified linear function became a central component of the transformer architecture introduced in the Vaswani et al paper "Attention Is All You Need". Within every transformer layer, ReLU is utilized in the position-wise feed-forward networks (FFN), defined by Equation 2 of their paper: FFN ⁡ ( x ) = max ( 0 , x W 1 + b 1 ) W 2 + b 2 {\displaystyle \operatorname {FFN} (x)=\max(0,xW_{1}+b_{1})W_{2}+b_{2}} This equation is foundational to the model's capacity; while the attention mechanism determines the relationships between tokens, the ReLU-based FFN performs the majority of the numerical computation and houses the bulk of the model's parameters. The efficiency and scalability of this rectified framework triggered a global technological revolution, enabling the development of Large Language Models that have had a profound economic impact. The industrial response to this architecture—including the massive expansion of AI-specific hardware and the birth of the generative AI sector—has positioned the Transformer as a cornerstone of 21st-century infrastructure. During the post 2017 period of rapid AI advancement, the rectified linear unit function has been key to achieving increased model performance and scaling due to the fact that it zeros out responses that are immaterial for a given stimuli, preventing them from accumulating in massive scale models. It is the complete silencing of the parts of the model found to be stimuli-irrelevant during learning that allows for scaling. As the stimuli-irrelevant proportion of the model becomes more massive, these highly numerous connections within the model would inevitably accumulate during scaling no matter how small each individual response is. Therefore, the rectified linear unit function, with its absolute zeroing property, enabled the scaling to hundred billion parameter models and beyond. Early Transformer scaling giants like GPT-3 (2020) and Falcon-180B (2023) relied on the rectified linear unit function explicitly, while successors such as GPT-4 (2023) and Llama 3 (2024) utilized smoother variants like GELU or SwiGLU. These variants were used to improve training stability while fundamentally preserving the rectified principle of zeroing low responses. At the centre of modern artificial intelligence ReLU and its variants maintain absolute zero response across the bulk of the model at any one time, while maintaining approximately linear reponses for stimuli-relevant connections enabling high performance on each specific cognitive task. This feature of activation sparsity has been critical for massive scaling and performance gains of AI models right up to the present day. == Advantages == Advantages of ReLU include: Sparse activation: for example, in a randomly initialized network, only about 50% of hidden units are activated (i.e. have a non-zero output). Better gradient propagation: fewer vanishing gradient problems compared to sigmoidal activation functions that saturate in both directions. Efficiency: only requires comparison and addition. Scale-invariant (homogeneous, or "intensity equivariance"): max ( 0 , a x ) = a max ( 0 , x ) for a ≥ 0 {\displaystyle \max(0,ax)=a\max(0,x){\text{ for }}a\geq 0} . == Potential problems == Possible downsides can include: Non-differentiability at zero (however, it is differentiable anywhere else, and the value of the derivative at zero can be chosen to be 0 or 1 arbitrarily). Not zero-centered: ReLU outputs are always non-negative. This can make it harder for the network to learn during backpropagation, because gradient updates tend to push weights in one direction (positive or negative). Batch normalization can help address this. ReLU is unbounded. Redundancy of the parametrization: Because ReLU is scale-invariant, the network computes the exact same function by scaling the weights and biases in front of a ReLU activation by k {\displaystyle k} , and the weights after by 1 / k {\displaystyle 1/k} . Dying ReLU: ReLU neurons can sometimes be pushed into states

    Read more →
  • Dynamic epistemic logic

    Dynamic epistemic logic

    Dynamic epistemic logic (DEL) is a logical framework dealing with knowledge and information change. Typically, DEL focuses on situations involving multiple agents and studies how their knowledge changes when events occur. These events can change factual properties of the actual world (they are called ontic events): for example a red card is painted in blue. They can also bring about changes of knowledge without changing factual properties of the world (they are called epistemic events): for example, a card is revealed publicly (or privately) to be red. Originally, DEL focused on epistemic events. Only some of the basic ideas are present in this entry of the original DEL framework; more details about DEL in general can be found in the references. Due to the nature of its object of study and its abstract approach, DEL is related and has applications to numerous research areas, such as computer science (artificial intelligence), philosophy (formal epistemology), economics (game theory) and cognitive science. In computer science, DEL is for example very much related to multi-agent systems, which are systems where multiple intelligent agents interact and exchange information. As a combination of dynamic logic and epistemic logic, dynamic epistemic logic is a young field of research. It really started in 1989 with Plaza's logic of public announcement. Independently, Gerbrandy and Groeneveld proposed a system dealing moreover with private announcement and that was inspired by the work of Veltman. Another system was proposed by van Ditmarsch whose main inspiration was the Cluedo game. But the most influential and original system was the system proposed by Baltag, Moss and Solecki. This system can deal with all the types of situations studied in the works above and its underlying methodology is conceptually grounded. This entry will present some of its basic ideas. Formally, DEL extends ordinary epistemic logic by the inclusion of event models to describe actions, and a product update operator that defines how epistemic models are updated as the consequence of executing actions described through event models. Epistemic logic will first be recalled. Then, actions and events will enter into the picture and we will introduce the DEL framework. == Epistemic logic == Epistemic logic is a modal logic dealing with the notions of knowledge and belief. As a logic, it is concerned with understanding the process of reasoning about knowledge and belief: which principles relating the notions of knowledge and belief are intuitively plausible? Like epistemology, it stems from the Greek word ϵ π ι σ τ η μ η {\displaystyle \epsilon \pi \iota \sigma \tau \eta \mu \eta } or ‘episteme’ meaning knowledge. Epistemology is nevertheless more concerned with analyzing the very nature and scope of knowledge, addressing questions such as “What is the definition of knowledge?” or “How is knowledge acquired?”. In fact, epistemic logic grew out of epistemology in the Middle Ages thanks to the efforts of Burley and Ockham. The formal work, based on modal logic, that inaugurated contemporary research into epistemic logic dates back only to 1962 and is due to Hintikka. It then sparked in the 1960s discussions about the principles of knowledge and belief and many axioms for these notions were proposed and discussed. For example, the interaction axioms K p → B p {\displaystyle Kp\rightarrow Bp} and B p → K B p {\displaystyle Bp\rightarrow KBp} are often considered to be intuitive principles: if an agent Knows p {\displaystyle p} then (s)he also Believes p {\displaystyle p} , or if an agent Believes p {\displaystyle p} , then (s)he Knows that (s)he Believes p {\displaystyle p} . More recently, these kinds of philosophical theories were taken up by researchers in economics, artificial intelligence and theoretical computer science where reasoning about knowledge is a central topic. Due to the new setting in which epistemic logic was used, new perspectives and new features such as computability issues were then added to the research agenda of epistemic logic. === Syntax === In the sequel, A G T S = { 1 , … , n } {\displaystyle AGTS=\{1,\ldots ,n\}} is a finite set whose elements are called agents and P R O P {\displaystyle PROP} is a set of propositional letters. The epistemic language is an extension of the basic multi-modal language of modal logic with a common knowledge operator C A {\displaystyle C_{A}} and a distributed knowledge operator D A {\displaystyle D_{A}} . Formally, the epistemic language L EL C {\displaystyle {\mathcal {L}}_{\textsf {EL}}^{C}} is defined inductively by the following grammar in BNF: L EL C : ϕ ::= p ∣ ¬ ϕ ∣ ( ϕ ∧ ϕ ) ∣ K j ϕ ∣ C A ϕ ∣ D A ϕ {\displaystyle {\mathcal {L}}_{\textsf {EL}}^{C}:\phi ~~::=~~p~\mid ~\neg \phi ~\mid ~(\phi \land \phi )~\mid ~K_{j}\phi ~\mid ~C_{A}\phi ~\mid ~D_{A}\phi } where p ∈ P R O P {\displaystyle p\in PROP} , j ∈ A G T S {\displaystyle j\in {AGTS}} and A ⊆ A G T S {\displaystyle A\subseteq {AGTS}} . The basic epistemic language L E L {\displaystyle {\mathcal {L}}_{EL}} is the language L E L C {\displaystyle {\mathcal {L}}_{EL}^{C}} without the common knowledge and distributed knowledge operators. The formula ⊥ {\displaystyle \bot } is an abbreviation for ¬ p ∧ p {\displaystyle \neg p\land p} (for a given p ∈ P R O P {\displaystyle p\in PROP} ), ⟨ K j ⟩ ϕ {\displaystyle \langle K_{j}\rangle \phi } is an abbreviation for ¬ K j ¬ ϕ {\displaystyle \neg K_{j}\neg \phi } , E A ϕ {\displaystyle E_{A}\phi } is an abbreviation for ⋀ j ∈ A K j ϕ {\displaystyle \bigwedge \limits _{j\in A}K_{j}\phi } and C ϕ {\displaystyle C\phi } an abbreviation for C A G T S ϕ {\displaystyle C_{AGTS}\phi } . Group notions: general, common and distributed knowledge. In a multi-agent setting there are three important epistemic concepts: general knowledge, distributed knowledge and common knowledge. The notion of common knowledge was first studied by Lewis in the context of conventions. It was then applied to distributed systems and to game theory, where it allows to express that the rationality of the players, the rules of the game and the set of players are commonly known. General knowledge. General knowledge of ϕ {\displaystyle \phi } means that everybody in the group of agents A G T S {\displaystyle {AGTS}} knows that ϕ {\displaystyle \phi } . Formally, this corresponds to the following formula: E ϕ := ⋀ j ∈ A G T S K j ϕ . {\displaystyle E\phi :={\underset {j\in {AGTS}}{\bigwedge }}K_{j}\phi .} Common knowledge. Common knowledge of ϕ {\displaystyle \phi } means that everybody knows ϕ {\displaystyle \phi } but also that everybody knows that everybody knows ϕ {\displaystyle \phi } , that everybody knows that everybody knows that everybody knows ϕ {\displaystyle \phi } , and so on ad infinitum. Formally, this corresponds to the following formula C ϕ := E ϕ ∧ E E ϕ ∧ E E E ϕ ∧ … {\displaystyle C\phi :=E\phi \land EE\phi \land EEE\phi \land \ldots } As we do not allow infinite conjunction the notion of common knowledge will have to be introduced as a primitive in our language. Before defining the language with this new operator, we are going to give an example introduced by Lewis that illustrates the difference between the notions of general knowledge and common knowledge. Lewis wanted to know what kind of knowledge is needed so that the statement p {\displaystyle p} : “every driver must drive on the right” be a convention among a group of agents. In other words, he wanted to know what kind of knowledge is needed so that everybody feels safe to drive on the right. Suppose there are only two agents i {\displaystyle i} and j {\displaystyle j} . Then everybody knowing p {\displaystyle p} (formally E p {\displaystyle Ep} ) is not enough. Indeed, it might still be possible that the agent i {\displaystyle i} considers possible that the agent j {\displaystyle j} does not know p {\displaystyle p} (formally ¬ K i K j p {\displaystyle \neg K_{i}K_{j}p} ). In that case the agent i {\displaystyle i} will not feel safe to drive on the right because he might consider that the agent j {\displaystyle j} , not knowing p {\displaystyle p} , could drive on the left. To avoid this problem, we could then assume that everybody knows that everybody knows that p {\displaystyle p} (formally E E p {\displaystyle EEp} ). This is again not enough to ensure that everybody feels safe to drive on the right. Indeed, it might still be possible that agent i {\displaystyle i} considers possible that agent j {\displaystyle j} considers possible that agent i {\displaystyle i} does not know p {\displaystyle p} (formally ¬ K i K j K i p {\displaystyle \neg K_{i}K_{j}K_{i}p} ). In that case and from i {\displaystyle i} ’s point of view, j {\displaystyle j} considers possible that i {\displaystyle i} , not knowing p {\displaystyle p} , will drive on the left. So from i {\displaystyle i} ’s point of view, j {\displaystyle j} might drive on the left as well (by the same argument as abov

    Read more →
  • Relief (feature selection)

    Relief (feature selection)

    Relief is an algorithm developed by Kenji Kira and Larry Rendell in 1992 that takes a filter-method approach to feature selection that is notably sensitive to feature interactions. It was originally designed for application to binary classification problems with discrete or numerical features. Relief calculates a feature score for each feature which can then be applied to rank and select top scoring features for feature selection. Alternatively, these scores may be applied as feature weights to guide downstream modeling. Relief feature scoring is based on the identification of feature value differences between nearest neighbor instance pairs. If a feature value difference is observed in a neighboring instance pair with the same class (a 'hit'), the feature score decreases. Alternatively, if a feature value difference is observed in a neighboring instance pair with different class values (a 'miss'), the feature score increases. The original Relief algorithm has since inspired a family of Relief-based feature selection algorithms (RBAs), including the ReliefF algorithm. Beyond the original Relief algorithm, RBAs have been adapted to (1) perform more reliably in noisy problems, (2) generalize to multi-class problems (3) generalize to numerical outcome (i.e. regression) problems, and (4) to make them robust to incomplete (i.e. missing) data. To date, the development of RBA variants and extensions has focused on four areas; (1) improving performance of the 'core' Relief algorithm, i.e. examining strategies for neighbor selection and instance weighting, (2) improving scalability of the 'core' Relief algorithm to larger feature spaces through iterative approaches, (3) methods for flexibly adapting Relief to different data types, and (4) improving Relief run efficiency. Their strengths are that they are not dependent on heuristics, they run in low-order polynomial time, and they are noise-tolerant and robust to feature interactions, as well as being applicable for binary or continuous data; however, it does not discriminate between redundant features, and low numbers of training instances fool the algorithm. == Relief Algorithm == Take a data set with n instances of p features, belonging to two known classes. Within the data set, each feature should be scaled to the interval [0 1] (binary data should remain as 0 and 1). The algorithm will be repeated m times. Start with a p-long weight vector (W) of zeros. At each iteration, take the feature vector (X) belonging to one random instance, and the feature vectors of the instance closest to X (by Euclidean distance) from each class. The closest same-class instance is called 'near-hit', and the closest different-class instance is called 'near-miss'. Update the weight vector such that W i = W i − ( x i − n e a r H i t i ) 2 + ( x i − n e a r M i s s i ) 2 , {\displaystyle W_{i}=W_{i}-(x_{i}-\mathrm {nearHit} _{i})^{2}+(x_{i}-\mathrm {nearMiss} _{i})^{2},} where i {\displaystyle i} indexes the components and runs from 1 to p. Thus the weight of any given feature decreases if it differs from that feature in nearby instances of the same class more than nearby instances of the other class, and increases in the reverse case. After m iterations, divide each element of the weight vector by m. This becomes the relevance vector. Features are selected if their relevance is greater than a threshold τ. Kira and Rendell's experiments showed a clear contrast between relevant and irrelevant features, allowing τ to be determined by inspection. However, it can also be determined by Chebyshev's inequality for a given confidence level (α) that a τ of 1/sqrt(αm) is good enough to make the probability of a Type I error less than α, although it is stated that τ can be much smaller than that. Relief was also described as generalizable to multinomial classification by decomposition into a number of binary problems. == ReliefF Algorithm == Kononenko et al. propose a number of updates to Relief. Firstly, they find the near-hit and near-miss instances using the Manhattan (L1) norm rather than the Euclidean (L2) norm, although the rationale is not specified. Furthermore, they found taking the absolute differences between xi and near-hiti, and xi and near-missi to be sufficient when updating the weight vector (rather than the square of those differences). === Reliable probability estimation === Rather than repeating the algorithm m times, implement it exhaustively (i.e. n times, once for each instance) for relatively small n (up to one thousand). Furthermore, rather than finding the single nearest hit and single nearest miss, which may cause redundant and noisy attributes to affect the selection of the nearest neighbors, ReliefF searches for k nearest hits and misses and averages their contribution to the weights of each feature. k can be tuned for any individual problem. === Incomplete data === In ReliefF, the contribution of missing values to the feature weight is determined using the conditional probability that two values should be the same or different, approximated with relative frequencies from the data set. This can be calculated if one or both features are missing. === Multi-class problems === Rather than use Kira and Rendell's proposed decomposition of a multinomial classification into a number of binomial problems, ReliefF searches for k near misses from each different class and averages their contributions for updating W, weighted with the prior probability of each class. == Other Relief-based Algorithm Extensions/Derivatives == The following RBAs are arranged chronologically from oldest to most recent. They include methods for improving (1) the core Relief algorithm concept, (2) iterative approaches for scalability, (3) adaptations to different data types, (4) strategies for computational efficiency, or (5) some combination of these goals. For more on RBAs see these book chapters or this most recent review paper. === RRELIEFF === Robnik-Šikonja and Kononenko propose further updates to ReliefF, making it appropriate for regression. === Relieved-F === Introduced deterministic neighbor selection approach and a new approach for incomplete data handling. === Iterative Relief === Implemented method to address bias against non-monotonic features. Introduced the first iterative Relief approach. For the first time, neighbors were uniquely determined by a radius threshold and instances were weighted by their distance from the target instance. === I-RELIEF === Introduced sigmoidal weighting based on distance from target instance. All instance pairs (not just a defined subset of neighbors) contributed to score updates. Proposed an on-line learning variant of Relief. Extended the iterative Relief concept. Introduced local-learning updates between iterations for improved convergence. === TuRF (a.k.a. Tuned ReliefF) === Specifically sought to address noise in large feature spaces through the recursive elimination of features and the iterative application of ReliefF. === Evaporative Cooling ReliefF === Similarly seeking to address noise in large feature spaces. Utilized an iterative `evaporative' removal of lowest quality features using ReliefF scores in association with mutual information. === EReliefF (a.k.a. Extended ReliefF) === Addressing issues related to incomplete and multi-class data. === VLSReliefF (a.k.a. Very Large Scale ReliefF) === Dramatically improves the efficiency of detecting 2-way feature interactions in very large feature spaces by scoring random feature subsets rather than the entire feature space. === ReliefMSS === Introduced calculation of feature weights relative to average feature 'diff' between instance pairs. === SURF === SURF identifies nearest neighbors (both hits and misses) based on a distance threshold from the target instance defined by the average distance between all pairs of instances in the training data. Results suggest improved power to detect 2-way epistatic interactions over ReliefF. === SURF (a.k.a. SURFStar) === SURF extends the SURF algorithm to not only utilized 'near' neighbors in scoring updates, but 'far' instances as well, but employing inverted scoring updates for 'far instance pairs. Results suggest improved power to detect 2-way epistatic interactions over SURF, but an inability to detect simple main effects (i.e. univariate associations). === SWRF === SWRF extends the SURF algorithm adopting sigmoid weighting to take distance from the threshold into account. Also introduced a modular framework for further developing RBAs called MoRF. === MultiSURF (a.k.a. MultiSURFStar) === MultiSURF extends the SURF algorithm adapting the near/far neighborhood boundaries based on the average and standard deviation of distances from the target instance to all others. MultiSURF uses the standard deviation to define a dead-band zone where 'middle-distance' instances do not contribute to scoring. Evidence suggests MultiSURF performs best in detecting pure 2-way feature interactions. === Reli

    Read more →
  • Prescription monitoring program

    Prescription monitoring program

    In the United States, prescription monitoring programs (PMPs) or prescription drug monitoring programs (PDMPs) are state-run programs which collect and distribute data about the prescription and dispensation of federally controlled substances and, depending on state requirements, other potentially abusable prescription drugs. PMPs are meant to help prevent adverse drug-related events such as opioid overdoses, drug diversion, and substance abuse by decreasing the amount and/or frequency of opioid prescribing, and by identifying those patients who are obtaining prescriptions from multiple providers (i.e., "doctor shopping") or those physicians overprescribing opioids. Most US health care workers support the idea of PMPs, which intend to assist physicians, physician assistants, nurse practitioners, dentists and other prescribers, the pharmacists, chemists and support staff of dispensing establishments. The database, whose use is required by State law, typically requires prescribers and pharmacies dispensing controlled substances to register with their respective state PMPs and (for pharmacies and providers who dispense from their offices) to report the dispensation of such prescriptions to an electronic online database. The majority of PMPs are authorized to notify law enforcement agencies or licensing boards or physicians when a prescriber, or patients receiving prescriptions, exceed thresholds established by the state or prescription recipient exceeds thresholds established by the State. All states have implemented PDMPs, although evidence for the effectiveness of these programs is mixed. While prescription of opioids has decreased with PMP use, overdose deaths in many states have actually increased, with those states sharing data with neighboring jurisdictions or requiring reporting of more drugs experiencing highest increases in deaths. This may be because those declined opioid prescriptions turn to street drugs, whose potency and contaminants carry greater overdose risk. == History == Prescription drug monitoring programs, or PDMPs, are an example of one initiative proposed to alleviate effects of the opioid crisis. The programs are designed to restrict prescription drug abuse by limiting a patient's ability to obtain similar prescriptions from multiple providers (i.e. “doctor shopping”) and reducing diversion of controlled substances. This is meant to reduce risk of fatal overdose caused by high doses of opioids or interactions between opioids and benzodiazepenes, and to enable better decision making on the part of healthcare providers who may be unaware of a patient's prescription drug use, history or other prescriptions. PDMPs have been implemented in state legislations since 1939 in California, a time before electronic medical records, though implementation rose alongside increased awareness of overprescribing of opioids and overdose. A later New York state program was struck down by the U.S. Supreme Court in Whalen v. Roe. But, by 2019, 49 states, the District of Columbia, and Guam had enacted PDMP legislation. In 2021 Missouri, the last State to not use a PMP, adopted legislation to create one. PMPs are constantly being updated to increase speed of data collection, sharing of data across States, and ease of interpretation. This is being done by integrating PDMP reports with other health information technologies such as health information exchanges (HIE), electronic health record (EHR) systems, and/ or pharmacy dispensing software systems. One program that has been implemented in nine states is called the PDMP Electronic Health Records Integration and Interoperability Expansion, also known as PEHRIIE. Another software, marketed by Bamboo Health and integrated with PMPs in 43 states, uses an algorithm to track factors thought to increase risk of diversion, abuse or overdose, and assigns patients a three digit score based on presumed indicators of risk. While some studies have suggested that PDMP-HIT integration and sharing of interstate data brings benefits such as reduced opioid-related inpatient morbidity, others have found no or negative impact on mortality compared to states without PMP data sharing. Patient and media reports suggest need for testing and evaluation of algorithmic software used to score risk, with some patients reporting denial of prescriptions without c explanation or clarity of data. == Goals == Most health care workers support PMPs which intend to assist physicians, physician assistants, nurse practitioners, dentists and other prescribers, the pharmacists, chemists and support staff of dispensing establishments, as well as law-enforcement agencies. The collaboration supports the legitimate medical use of controlled substances while limiting their abuse and diversion. Pharmacies dispensing controlled substances and prescribers typically must register with their respective state PMPs and (for pharmacies and providers who dispense controlled substances from their offices) report the dispensation to an electronic online database. Some pharmacy software can submit these reports automatically to multiple states. == Usage == === List of programs by state === === Software systems === NarxCare is a prescription drug monitoring program (PDMP) run by Bamboo Health. Bamboo Health was formerly known as Appriss. It is widely used across the United States by pharmacies including Rite Aid as well as those at Walmart and Sam’s Club. The NarxCare software allows doctors to view data about a patient, combining data from the prescription registries of various U.S. states to make the registries interoperable nationally. It also uses machine learning to generate an "Overdose Risk Score" that potentially includes EMS and criminal justice data; these scores have been criticized by researchers and patient advocates for the lack of transparency in the process as well as the potential for disparate treatment of women and minority groups. Advertised as an "analytics tool and care management platform", the NarxCare software allows doctors to view data about a patient including how many pharmacies they have visited and the combinations of medication they are prescribed. It combines data from the prescription registries of various U.S. states, making the registries interoperable nationally. It additionally uses machine learning to generate various three-digit "risk scores" and an overall "Overdose Risk Score", collectively referred to as Narx Scores, in a process that potentially includes EMS and criminal justice data as well as court records. == Controversy == Many doctors and researchers support the idea of PDMPs as a tool in combatting the opioid epidemic. Opioid prescribing, opioid diversion and supply, opioid misuse, and opioid-related morbidity and mortality are common elements in data entered into PDMPs. Prescription Monitoring Programs are purported to offer economic benefits for the states who implement them by decreasing overall health care costs, lost productivity, and investigation times. However, there are many studies that conclude the impact of PDMPs is unclear. While use of PMPs has been accompanied by decrease in opioid prescribing, few analyses consider corresponding use of street opioids, extramedical use, or diversion, which might provide a more holistic method for evaluation of PMP intent and efficacy. Evidence for PDMP impact on fatal overdoses is decidedly mixed, with multiple studies finding increased overdose rates in some states, decreases in others, or no clear impact. Interestingly, an increase in heroin overdoses after PDMP implementation has been commonly reported, presumably as denial of prescription opioids sends patients in search of street drugs. Narx Scores have been criticized by researchers and patient advocates for the lack of transparency in the generation process as well as the potential for disparate treatment of women and minority groups. Writing in Duke Law Journal, Jennifer Oliva stated that "black-box algorithms" are used to generate the scores.

    Read more →
  • Tensor sketch

    Tensor sketch

    In statistics, machine learning and algorithms, a tensor sketch is a type of dimensionality reduction that is particularly efficient when applied to vectors that have tensor structure. Such a sketch can be used to speed up explicit kernel methods, bilinear pooling in neural networks and is a cornerstone in many numerical linear algebra algorithms. == Mathematical definition == Mathematically, a dimensionality reduction or sketching matrix is a matrix M ∈ R k × d {\displaystyle M\in \mathbb {R} ^{k\times d}} , where k < d {\displaystyle k Read more →

  • Corona-Warn-App

    Corona-Warn-App

    Corona-Warn-App was the official and open-source COVID-19 contact tracing app used for digital contact tracing in Germany made by SAP and Deutsche Telekom subsidiary T-Systems. It had been downloaded 22.8 million times as of 19 November 2020 and 26.2 million times as of 18 March 2021. The app has been promoted by billboard and broadcast advertisements, e.g. in cooperation with the German Football Association (DFB) and other prominent companies. The German government has announced that the app would no longer exchange tracing information as of April 30, 2023 & would enter hibernation as of June 1, 2023. == Effectiveness == Experts believe that time saved by using the app can be critical for improving the effectiveness contact tracing efforts. Some virologists say when at least 60% of people in Germany use it, it would be very effective. == Functioning == The app works with the Exposure Notification Framework (what is implemented in Google Play Services for Android and in iOS) by using Bluetooth to exchange codes with app users that are within 1.5 meters of each other for a period of at least 10 minutes. Anyone who tests positive for COVID-19 can share this information voluntarily with the app. Other app users are then notified about when, how long and at what distance they had contact with the infected person within a 14-day period. Testing is available for persons on a voluntary basis. === Server architecture === Based on the Client–server model five servers are operated within the app backend: the Corona-Warn-App server. It stores the authorized keys of infected users, referred to as diagnosis keys, from the past 14 days in its database. Stored diagnosis keys are grouped into regularly updated blocks which are transmitted to the Content Delivery Network. This interface supplies the keys for the app clients to download and locally compute a potential exposure risk. the Verification server. It is responsible for documenting the approval of the user to share their positive test result with the app and also to verify the test result. the Portal Server. It generates a so-called teleTAN token if the user did not give their consent to share their test result with the app at first but then changed their mind or if the local public health authority or test laboratory is not connected to the app system yet. the Test Result Server. It saves the test results provided by the local public health authorities or test laboratories for further use within the backend. the Federation Gateway Server. It connects to the national Corona-Warn-App servers of participating EU countries to enable transnational key exchange. By the distribution of the data on different servers the decoupling of the data becomes possible and results in an obstructed tracing of the app users. ==== Report of a positive COVID-19 test ==== The app provides a function to warn other app users by uploading their positive test result on a voluntarily and anonymous basis to the Corona-Warn-App server. In case the local public health authority or test laboratory is already connected to the app system, the user receives a QR-Code when the swab specimen is taken that can be scanned in the app. After scanning the QR-Code und the user getting authorized by the Verification server, the app receives an individual Registration token which gets stored locally and with which the status and the result of the test can be checked manually as well as automatically. If the local public health authority or test laboratory is not connected to the app system yet and the user wants to share their positive test result with other app users, it is required to request a teleTAN token by calling the verification hotline of the app. In both cases, the user can upload their diagnosis keys of the last 14 days to the Corona-Warn-App server in case their consent to share the information is given. The Corona-Warn-App server then verifies the uploaded keys by asking the Verification server if the keys are valid and if they are, the Corona-Warn-App server stores them in its database. == Privacy == The use of the app is voluntary. The app implements decentralized data storage to ensure data privacy. Employers can require that Corona-Warn be installed on company phones, but can not compel its use on private phones. == Funding == The open source app, which costs €20 million to develop is intended to supplement human contact tracing efforts, which Germany put in place during the early stages of the COVID-19 pandemic in Germany. In August 2022, a spokesperson for the German ministry of health announced that the total costs including all additional developments are now estimated to be closer to €150m. == Interoperability == At its start the app only worked in Germany, and Jens Spahn, than Federal Minister of Health (CDU), has said the development of a Europe-wide system is a future goal. With the update published on 19 October 2020 the app supports key-exchanges with the EU Interoperability Gateway and is therefore able to communicate with contact tracing apps from Ireland and Italy. Austria, Belgium, Czech Republic, Croatia, Cyprus, Denmark, Finland, Ireland, Italy, Latvia, Malta, Netherlands, Norway, Poland, Slovenia, Spain and Switzerland had joined the gateway as well and are also able to exchange keys with Corona-Warn-App. The app can be downloaded in many App stores outside of Germany. However, as of August 2021, the app is still unavailable for those of notable national German minorities like Turks, Russians or Ukrainians, who use App stores of their home countries. == Software variants == An unofficial Corona-Warn-App has been released on F-Droid, making the app available without proprietary components on Android phones. == Literature == Thomas Köllmann: Die Corona-Warn-App – Schnittstelle zwischen Datenschutz- und Arbeitsrecht. In: Neue Zeitschrift für Arbeitsrecht. Nr. 13, 10. Juli 2020, S. 831–836.

    Read more →
  • Deterministic blockmodeling

    Deterministic blockmodeling

    Deterministic blockmodeling is an approach in blockmodeling that does not assume a probabilistic model, and instead relies on the exact or approximate algorithms, which are used to find blockmodel(s). This approach typically minimizes some inconsistency that can occur with the ideal block structure. Such analysis is focused on clustering (grouping) of the network (or adjacency matrix) that is obtained with minimizing an objective function, which measures discrepancy from the ideal block structure. However, some indirect approaches (or methods between direct and indirect approaches, such as CONCOR) do not explicitly minimize inconsistencies or optimize some criterion function. This approach was popularized in the 1970s, due to the presence of two computer packages (CONCOR and STRUCTURE) that were used to "find a permutation of the rows and columns in the adjacency matrix leading to an approximate block structure". The opposite approach to deterministic blockmodeling is a stochastic blockmodeling approach.

    Read more →
  • Evolutionary programming

    Evolutionary programming

    Evolutionary programming is an evolutionary algorithm, where a share of new population is created by mutation of previous population without crossover. Evolutionary programming differs from evolution strategy ES( μ + λ {\displaystyle \mu +\lambda } ) in one detail. All individuals are selected for the new population, while in ES( μ + λ {\displaystyle \mu +\lambda } ), every individual has the same probability to be selected. It is one of the four major evolutionary algorithm paradigms. == History == It was first used by Lawrence J. Fogel in the US in 1960 in order to use simulated evolution as a learning process aiming to generate artificial intelligence. It was used to evolve finite-state machines as predictors.

    Read more →
  • Rectified linear unit

    Rectified linear unit

    In the context of artificial neural networks, the rectifier or ReLU (rectified linear unit) activation function is an activation function defined as the non-negative part of its argument, i.e., the ramp function: ReLU ⁡ ( x ) = x + = max ( 0 , x ) = x + | x | 2 = { x if x > 0 , 0 x ≤ 0 {\displaystyle \operatorname {ReLU} (x)=x^{+}=\max(0,x)={\frac {x+|x|}{2}}={\begin{cases}x&{\text{if }}x>0,\\0&x\leq 0\end{cases}}} where x {\displaystyle x} is the input to a neuron. This is analogous to half-wave rectification in electrical engineering. ReLU is one of the most popular activation functions for artificial neural networks, and finds application in computer vision and speech recognition using deep neural nets and computational neuroscience. == History == The ReLU was first used by Alston Householder in 1941 as a mathematical abstraction of biological neural networks. Kunihiko Fukushima in 1969 used ReLU in the context of visual feature extraction in hierarchical neural networks. In 1998, Gregory Woodbury demonstrated that the rectified linear function could account for a broad range of emergent properties in the visual cortex. His work showed that a single unified model could drive the joint development of refined retinotopic maps, ocular dominance columns, and orientation selectivity. By utilizing the rectifier's "cutoff" property, Woodbury achieved a close quantitative fit to biological data, matching the spatial periodicities and topographic refinement patterns observed in macaque and cat cortical maps. Furthermore, he extended this framework to adult plasticity, accurately replicating the spatial and temporal dynamics of lesion-induced cortical reorganization. This research established that the rectified linear response was a necessary mechanism for the stable self-organisation and maintenance of complex, multi-feature neural maps. In 2000, Hahnloser et al. argued that ReLU approximates the biological relationship between neural firing rates and input current, in addition to enabling recurrent neural network dynamics to stabilise under weaker criteria. Prior to 2010, most activation functions used were the logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more numerically efficient counterpart, the hyperbolic tangent. Around 2010, the use of ReLU became common again. Jarrett et al. (2009) noted that rectification by either absolute or ReLU (which they called "positive part") was critical for object recognition in convolutional neural networks (CNNs), specifically because it allows average pooling without neighboring filter outputs cancelling each other out. They hypothesized that the use of sigmoid or tanh was responsible for poor performance in previous CNNs. Nair and Hinton (2010) made a theoretical argument that the softplus activation function should be used, in that the softplus function numerically approximates the sum of an exponential number of linear models that share parameters. They then proposed ReLU as a good approximation to it. Specifically, they began by considering a single binary neuron in a Boltzmann machine that takes x {\displaystyle x} as input, and produces 1 as output with probability σ ( x ) = 1 1 + e − x {\displaystyle \sigma (x)={\frac {1}{1+e^{-x}}}} . They then considered extending its range of output by making infinitely many copies of it X 1 , X 2 , X 3 , … {\displaystyle X_{1},X_{2},X_{3},\dots } , that all take the same input, offset by an amount 0.5 , 1.5 , 2.5 , … {\displaystyle 0.5,1.5,2.5,\dots } , then their outputs are added together as ∑ i = 1 ∞ X i {\displaystyle \sum _{i=1}^{\infty }X_{i}} . They then demonstrated that ∑ i = 1 ∞ X i {\displaystyle \sum _{i=1}^{\infty }X_{i}} is approximately equal to N ( log ⁡ ( 1 + e x ) , σ ( x ) ) {\displaystyle {\mathcal {N}}(\log(1+e^{x}),\sigma (x))} , which is also approximately equal to ReLU ⁡ ( N ( x , σ ( x ) ) ) {\displaystyle \operatorname {ReLU} ({\mathcal {N}}(x,\sigma (x)))} , where N {\displaystyle {\mathcal {N}}} stands for the gaussian distribution. They also argued for another reason for using ReLU: that it allows "intensity equivariance" in image recognition. That is, multiplying input image by a constant k {\displaystyle k} multiplies the output also. In contrast, this is false for other activation functions like sigmoid or tanh. They found that ReLU activation allowed good empirical performance in restricted Boltzmann machines. Glorot et al (2011) argued that ReLU has the following advantages over sigmoid or tanh: ReLU is more similar to biological neurons' responses in their main operating regime. ReLU avoids vanishing gradients. ReLU is cheaper to compute. ReLU creates sparse representation naturally, because many hidden units output exactly zero for a given input. They also found empirically that deep networks trained with ReLU can achieve strong performance without unsupervised pre-training, especially on large, purely supervised tasks. In 2017, the rectified linear function became a central component of the transformer architecture introduced in the Vaswani et al paper "Attention Is All You Need". Within every transformer layer, ReLU is utilized in the position-wise feed-forward networks (FFN), defined by Equation 2 of their paper: FFN ⁡ ( x ) = max ( 0 , x W 1 + b 1 ) W 2 + b 2 {\displaystyle \operatorname {FFN} (x)=\max(0,xW_{1}+b_{1})W_{2}+b_{2}} This equation is foundational to the model's capacity; while the attention mechanism determines the relationships between tokens, the ReLU-based FFN performs the majority of the numerical computation and houses the bulk of the model's parameters. The efficiency and scalability of this rectified framework triggered a global technological revolution, enabling the development of Large Language Models that have had a profound economic impact. The industrial response to this architecture—including the massive expansion of AI-specific hardware and the birth of the generative AI sector—has positioned the Transformer as a cornerstone of 21st-century infrastructure. During the post 2017 period of rapid AI advancement, the rectified linear unit function has been key to achieving increased model performance and scaling due to the fact that it zeros out responses that are immaterial for a given stimuli, preventing them from accumulating in massive scale models. It is the complete silencing of the parts of the model found to be stimuli-irrelevant during learning that allows for scaling. As the stimuli-irrelevant proportion of the model becomes more massive, these highly numerous connections within the model would inevitably accumulate during scaling no matter how small each individual response is. Therefore, the rectified linear unit function, with its absolute zeroing property, enabled the scaling to hundred billion parameter models and beyond. Early Transformer scaling giants like GPT-3 (2020) and Falcon-180B (2023) relied on the rectified linear unit function explicitly, while successors such as GPT-4 (2023) and Llama 3 (2024) utilized smoother variants like GELU or SwiGLU. These variants were used to improve training stability while fundamentally preserving the rectified principle of zeroing low responses. At the centre of modern artificial intelligence ReLU and its variants maintain absolute zero response across the bulk of the model at any one time, while maintaining approximately linear reponses for stimuli-relevant connections enabling high performance on each specific cognitive task. This feature of activation sparsity has been critical for massive scaling and performance gains of AI models right up to the present day. == Advantages == Advantages of ReLU include: Sparse activation: for example, in a randomly initialized network, only about 50% of hidden units are activated (i.e. have a non-zero output). Better gradient propagation: fewer vanishing gradient problems compared to sigmoidal activation functions that saturate in both directions. Efficiency: only requires comparison and addition. Scale-invariant (homogeneous, or "intensity equivariance"): max ( 0 , a x ) = a max ( 0 , x ) for a ≥ 0 {\displaystyle \max(0,ax)=a\max(0,x){\text{ for }}a\geq 0} . == Potential problems == Possible downsides can include: Non-differentiability at zero (however, it is differentiable anywhere else, and the value of the derivative at zero can be chosen to be 0 or 1 arbitrarily). Not zero-centered: ReLU outputs are always non-negative. This can make it harder for the network to learn during backpropagation, because gradient updates tend to push weights in one direction (positive or negative). Batch normalization can help address this. ReLU is unbounded. Redundancy of the parametrization: Because ReLU is scale-invariant, the network computes the exact same function by scaling the weights and biases in front of a ReLU activation by k {\displaystyle k} , and the weights after by 1 / k {\displaystyle 1/k} . Dying ReLU: ReLU neurons can sometimes be pushed into states

    Read more →
  • Comparison of operating systems

    Comparison of operating systems

    These tables provide a comparison of operating systems, of computer devices, as listing general and technical information for a number of widely used and currently available PC or handheld (including smartphone and tablet computer) operating systems. The article "Usage share of operating systems" provides a broader, and more general, comparison of operating systems that includes servers, mainframes and supercomputers. Because of the large number and variety of available Linux distributions, they are all grouped under a single entry; see comparison of Linux distributions for a detailed comparison. There is also a variety of BSD and DOS operating systems, covered in comparison of BSD operating systems and comparison of DOS operating systems. == Nomenclature == The nomenclature for operating systems varies among providers and sometimes within providers. For purposes of this article the terms used are; kernel In some operating systems, the OS is split into a low level region called the kernel and higher level code that relies on the kernel. Typically the kernel implements processes but its code does not run as part of a process. hybrid kernel monolithic kernel Nucleus In some operating systems there is OS code permanently present in a contiguous region of memory addressable by unprivileged code; in IBM systems this is typically referred to as the nucleus. The nucleus typically contains both code that requires special privileges and code that can run in an unprivileged state. Typically some code in the nucleus runs in the context of a dispatching unit, e.g., address space, process, task, thread, while other code runs independent of any dispatching unit. In contemporary operating systems unprivileged applications cannot alter the nucleus. License and pricing policies vary widely among different systems. Among others, the tables below use the following terms: BSD BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. bundled The fee is included in the price of the hardware == General information == == Technical information == == Security == == Commands == For POSIX compliant (or partly compliant) systems like FreeBSD, Linux, macOS or Solaris, the basic commands are the same because they are standardized. NOTE: Linux systems may vary by distribution which specific program, or even 'command' is called, via the POSIX alias function. For example, if you wanted to use the DOS dir to give you a directory listing with one detailed file listing per line you could use alias dir='ls -lahF' (e.g. in a session configuration file).

    Read more →
  • Locality-sensitive hashing

    Locality-sensitive hashing

    In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. The number of buckets is much smaller than the universe of possible input items. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items. Hashing-based approximate nearest-neighbor search algorithms generally use one of two main categories of hashing methods: either data-independent methods, such as locality-sensitive hashing (LSH); or data-dependent methods, such as locality-preserving hashing (LPH). Locality-preserving hashing was initially devised as a way to facilitate data pipelining in implementations of massively parallel algorithms that use randomized routing and universal hashing to reduce memory contention and network congestion. == Definitions == A finite family F {\displaystyle {\mathcal {F}}} of functions h : M → S {\displaystyle h\colon M\to S} is defined to be an LSH family for a metric space M = ( M , d ) {\displaystyle {\mathcal {M}}=(M,d)} , a threshold r > 0 {\displaystyle r>0} , an approximation factor c > 1 {\displaystyle c>1} , and probabilities p 1 > p 2 {\displaystyle p_{1}>p_{2}} if it satisfies the following condition. For any two points a , b ∈ M {\displaystyle a,b\in M} and a hash function h {\displaystyle h} chosen uniformly at random from F {\displaystyle {\mathcal {F}}} : If d ( a , b ) ≤ r {\displaystyle d(a,b)\leq r} , then h ( a ) = h ( b ) {\displaystyle h(a)=h(b)} (i.e., a and b collide) with probability at least p 1 {\displaystyle p_{1}} , If d ( a , b ) ≥ c r {\displaystyle d(a,b)\geq cr} , then h ( a ) = h ( b ) {\displaystyle h(a)=h(b)} with probability at most p 2 {\displaystyle p_{2}} . Such a family F {\displaystyle {\mathcal {F}}} is called ( r , c r , p 1 , p 2 ) {\displaystyle (r,cr,p_{1},p_{2})} -sensitive. === LSH with respect to a similarity measure === Alternatively it is possible to define an LSH family on a universe of items U endowed with a similarity function ϕ : U × U → [ 0 , 1 ] {\displaystyle \phi \colon U\times U\to [0,1]} . In this setting, a LSH scheme is a family of hash functions H coupled with a probability distribution D over H such that a function h ∈ H {\displaystyle h\in H} chosen according to D satisfies P r [ h ( a ) = h ( b ) ] = ϕ ( a , b ) {\displaystyle Pr[h(a)=h(b)]=\phi (a,b)} for each a , b ∈ U {\displaystyle a,b\in U} . === Amplification === Given a ( d 1 , d 2 , p 1 , p 2 ) {\displaystyle (d_{1},d_{2},p_{1},p_{2})} -sensitive family F {\displaystyle {\mathcal {F}}} , we can construct new families G {\displaystyle {\mathcal {G}}} by either the AND-construction or OR-construction of F {\displaystyle {\mathcal {F}}} . To create an AND-construction, we define a new family G {\displaystyle {\mathcal {G}}} of hash functions g, where each function g is constructed from k random functions h 1 , … , h k {\displaystyle h_{1},\ldots ,h_{k}} from F {\displaystyle {\mathcal {F}}} . We then say that for a hash function g ∈ G {\displaystyle g\in {\mathcal {G}}} , g ( x ) = g ( y ) {\displaystyle g(x)=g(y)} if and only if all h i ( x ) = h i ( y ) {\displaystyle h_{i}(x)=h_{i}(y)} for i = 1 , 2 , … , k {\displaystyle i=1,2,\ldots ,k} . Since the members of F {\displaystyle {\mathcal {F}}} are independently chosen for any g ∈ G {\displaystyle g\in {\mathcal {G}}} , G {\displaystyle {\mathcal {G}}} is a ( d 1 , d 2 , p 1 k , p 2 k ) {\displaystyle (d_{1},d_{2},p_{1}^{k},p_{2}^{k})} -sensitive family. To create an OR-construction, we define a new family G {\displaystyle {\mathcal {G}}} of hash functions g, where each function g is constructed from k random functions h 1 , … , h k {\displaystyle h_{1},\ldots ,h_{k}} from F {\displaystyle {\mathcal {F}}} . We then say that for a hash function g ∈ G {\displaystyle g\in {\mathcal {G}}} , g ( x ) = g ( y ) {\displaystyle g(x)=g(y)} if and only if h i ( x ) = h i ( y ) {\displaystyle h_{i}(x)=h_{i}(y)} for one or more values of i. Since the members of F {\displaystyle {\mathcal {F}}} are independently chosen for any g ∈ G {\displaystyle g\in {\mathcal {G}}} , G {\displaystyle {\mathcal {G}}} is a ( d 1 , d 2 , 1 − ( 1 − p 1 ) k , 1 − ( 1 − p 2 ) k ) {\displaystyle (d_{1},d_{2},1-(1-p_{1})^{k},1-(1-p_{2})^{k})} -sensitive family. == Applications == LSH has been applied to several problem domains, including: Near-duplicate detection Hierarchical clustering Genome-wide association study Image similarity identification VisualRank Gene expression similarity identification Audio similarity identification Nearest neighbor search Audio fingerprint Digital video fingerprinting Shared memory organization in parallel computing Physical data organization in database management systems Training fully connected neural networks Computer security Machine learning == Methods == === Bit sampling for Hamming distance === One of the easiest ways to construct an LSH family is by bit sampling. This approach works for the Hamming distance over d-dimensional vectors { 0 , 1 } d {\displaystyle \{0,1\}^{d}} . Here, the family F {\displaystyle {\mathcal {F}}} of hash functions is simply the family of all the projections of points on one of the d {\displaystyle d} coordinates, i.e., F = { h : { 0 , 1 } d → { 0 , 1 } ∣ h ( x ) = x i for some i ∈ { 1 , … , d } } {\displaystyle {\mathcal {F}}=\{h\colon \{0,1\}^{d}\to \{0,1\}\mid h(x)=x_{i}{\text{ for some }}i\in \{1,\ldots ,d\}\}} , where x i {\displaystyle x_{i}} is the i {\displaystyle i} th coordinate of x {\displaystyle x} . A random function h {\displaystyle h} from F {\displaystyle {\mathcal {F}}} simply selects a random bit from the input point. This family has the following parameters: P 1 = 1 − R / d {\displaystyle P_{1}=1-R/d} , P 2 = 1 − c R / d {\displaystyle P_{2}=1-cR/d} . That is, any two vectors x , y {\displaystyle x,y} with Hamming distance at most R {\displaystyle R} collide under a random h {\displaystyle h} with probability at least P 1 {\displaystyle P_{1}} . Any x , y {\displaystyle x,y} with Hamming distance at least c R {\displaystyle cR} collide with probability at most P 2 {\displaystyle P_{2}} . === Min-wise independent permutations === Suppose U is composed of subsets of some ground set of enumerable items S and the similarity function of interest is the Jaccard index J. If π is a permutation on the indices of S, for A ⊆ S {\displaystyle A\subseteq S} let h ( A ) = min a ∈ A { π ( a ) } {\displaystyle h(A)=\min _{a\in A}\{\pi (a)\}} . Each possible choice of π defines a single hash function h mapping input sets to elements of S. Define the function family H to be the set of all such functions and let D be the uniform distribution. Given two sets A , B ⊆ S {\displaystyle A,B\subseteq S} the event that h ( A ) = h ( B ) {\displaystyle h(A)=h(B)} corresponds exactly to the event that the minimizer of π over A ∪ B {\displaystyle A\cup B} lies inside A ∩ B {\displaystyle A\cap B} . As h was chosen uniformly at random, P r [ h ( A ) = h ( B ) ] = J ( A , B ) {\displaystyle Pr[h(A)=h(B)]=J(A,B)\,} and ( H , D ) {\displaystyle (H,D)\,} define an LSH scheme for the Jaccard index. Because the symmetric group on n elements has size n!, choosing a truly random permutation from the full symmetric group is infeasible for even moderately sized n. Because of this fact, there has been significant work on finding a family of permutations that is "min-wise independent" — a permutation family for which each element of the domain has equal probability of being the minimum under a randomly chosen π. It has been established that a min-wise independent family of permutations is at least of size lcm ⁡ { 1 , 2 , … , n } ≥ e n − o ( n ) {\displaystyle \operatorname {lcm} \{\,1,2,\ldots ,n\,\}\geq e^{n-o(n)}} , and that this bound is tight. Because min-wise independent families are too big for practical applications, two variant notions of min-wise independence are introduced: restricted min-wise independent permutations families, and approximate min-wise independent families. Restricted min-wise independence is the min-wise independence property restricted to certain sets of cardinality at most k. Approximate min-wise independence differs from the property by at most a fixed ε. === Open source methods === ==== Nilsimsa Hash ==== Nilsimsa is a locality-sensitive hashing algorithm used in anti-spam efforts. The goal of Nilsimsa is to generate a hash digest of an email message such that the digests of two similar messages are similar to each other. The paper suggests that the Nilsimsa satisfies three requirements: The digest identifying each message should not

    Read more →
  • Genotypic and phenotypic repair

    Genotypic and phenotypic repair

    Genotypic and phenotypic repair are optional components of an evolutionary algorithm (EA). An EA reproduces essential elements of biological evolution as a computer algorithm in order to solve demanding optimization or planning tasks, at least approximately. A candidate solution is represented by a - usually linear - data structure that plays the role of an individual's chromosome. New solution candidates are generated by mutation and crossover operators following the example of biology. These offspring may be defective, which is corrected or compensated for by genotypic or phenotypic repair. == Description == Genotypic repair, also known as genetic repair, is the removal or correction of impermissible entries in the chromosome that violate restrictions. In phenotypic repair, the corrections are only made in the genotype-phenotype mapping and the chromosome remains unchanged. Michalewicz wrote about the importance of restrictions in real-world applications: "In general, constraints are an integral part of the formulation of any problem". Restriction violations are application-specific and therefore it depends on the current problem whether and which type of repair is useful. They can usually also be treated by a correspondingly extended evaluation and it depends on the problem which measures are possible and which is the most suitable. If a phenotypic repair is feasible, then it is usually the most efficient compared to the other measures. A survey on repair methods used as constraint handling techniques can be found in. Violations of the range limits of genes should be avoided as far as possible by the formulation of the genome. If this is not possible or if restrictions within the search space defined by the genome are involved, their violations are usually handled by the evaluation. This can be done, for example, by penalty functions that lower the fitness. Repair is often also required for combinatorial tasks. The application of a 1- or n-point crossover operator can, for example, lead to genes being missing in one of the child genomes that are present in duplicate in the other. In this case, a suitable genotypic repair measure is to move the surplus genes to the other genome in a positional manner. The use of the aforementioned operators in combinatorial tasks has also proven to be useful in combination with crossover types specially developed for permutations, at least for certain problems. Particularly in combinatorial problems, it has been observed that genotypic repair can promote premature convergence to a suboptimum, but can also significantly accelerate a successful search. Studies on various tasks have shown that this is application-dependent. An effective measure to avoid premature convergence is generally the use of structured populations instead of the usual panmictic ones. Sequence restrictions play a role in many scheduling tasks, for example when it comes to planning workflows. If, for example, it is specified that step A must be carried out before step B and the gene of step B is located before the gene of A in the chromosome, then there is an impermissible gene sequence. This is because the scheduling operation of step B requires the planned end of step A for correct scheduling, but this is not yet scheduled at the time gene B is processed. The problem can be solved in two ways: The scheduling operation of step B is postponed until the gene from step A has been processed. The genome remains unchanged and the repair only influences the genotype-phenotype mapping. Since only the phenotype is changed, this is referred to as phenotypic repair. If, on the other hand, the gene of step B is moved behind the gene of step A, this is a genotypic repair. The same applies to the alternative shift of gene A in front of gene B. In this case, genotypic repair has the disadvantage that it prevents a meaningful restructuring of the gene sequence in the chromosome if this requires several intermediate steps (mutations) that at least partially violate restrictions.

    Read more →