AI Essay Verification

AI Essay Verification — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Outline of robotics

    Outline of robotics

    The following outline is provided as an overview of and topical guide to robotics: Robotics is a branch of mechanical engineering, electrical engineering and computer science that deals with the design, construction, operation, and application of robots, as well as computer systems for their control, sensory feedback, and information processing. These technologies deal with automated machines that can take the place of humans in dangerous environments or manufacturing processes, or resemble humans in appearance, behaviour, and or cognition. Many of today's robots are inspired by nature contributing to the field of bio-inspired robotics. The word "robot" was introduced to the public by Czech writer Karel Čapek in his play R.U.R. (Rossum's Universal Robots), published in 1920. The term "robotics" was coined by Isaac Asimov in his 1941 science fiction short-story "Liar!" == Nature of robotics == Robotics can be described as: An applied science – scientific knowledge transferred into a physical environment. A branch of computer science – A branch of electrical engineering – A branch of mechanical engineering – Research and development – A branch of technology – == Branches of robotics == Adaptive control – control method used by a controller which must adapt to a controlled system with parameters which vary, or are initially uncertain. For example, as an aircraft flies, its mass will slowly decrease as a result of fuel consumption; a control law is needed that adapts itself to such changing conditions. Aerial robotics – development of unmanned aerial vehicles (UAVs), commonly known as drones, aircraft without a human pilot aboard. Their flight is controlled either autonomously by onboard computers or by the remote control of a pilot on the ground or in another vehicle. Android science – interdisciplinary framework for studying human interaction and cognition based on the premise that a very humanlike robot (that is, an android) can elicit human-directed social responses in human beings. Anthrobotics – science of developing and studying robots that are either entirely or in some way human-like. Artificial intelligence – the intelligence of machines and the branch of computer science that aims to create it. Artificial neural networks – a mathematical model inspired by biological neural networks. Autonomous car – an autonomous vehicle capable of fulfilling the human transportation capabilities of a traditional car Autonomous research robotics – Bayesian network – BEAM robotics – a style of robotics that primarily uses simple analogue circuits instead of a microprocessor in order to produce an unusually simple design (in comparison to traditional mobile robots) that trades flexibility for robustness and efficiency in performing the task for which it was designed. Behavior-based robotics – the branch of robotics that incorporates modular or behavior based AI (BBAI). Bio-inspired robotics – making robots that are inspired by biological systems. Biomimicry and bio-inspired design are sometimes confused. Biomimicry is copying the nature while bio-inspired design is learning from nature and making a mechanism that is simpler and more effective than the system observed in nature. Biomimetic – see Bionics. Biomorphic robotics – a sub-discipline of robotics focused upon emulating the mechanics, sensor systems, computing structures and methodologies used by animals. Bionics – also known as biomimetics, biognosis, biomimicry, or bionical creativity engineering is the application of biological methods and systems found in nature to the study and design of engineering systems and modern technology. Biorobotics – a study of how to make robots that emulate or simulate living biological organisms mechanically or even chemically. Cloud robotics – is a field of robotics that attempts to invoke cloud technologies such as cloud computing, cloud storage, and other Internet technologies centered around the benefits of converged infrastructure and shared services for robotics. Cognitive robotics – views animal cognition as a starting point for the development of robotic information processing, as opposed to more traditional Artificial Intelligence techniques. Clustering – Computational neuroscience – study of brain function in terms of the information processing properties of the structures that make up the nervous system. Robot control – a study of controlling robots Robotics conventions – Data mining Techniques – Degrees of freedom – in mechanics, the degree of freedom (DOF) of a mechanical system is the number of independent parameters that define its configuration. It is the number of parameters that determine the state of a physical system and is important to the analysis of systems of bodies in mechanical engineering, aeronautical engineering, robotics, and structural engineering. Developmental robotics – a methodology that uses metaphors from neural development and developmental psychology to develop the mind for autonomous robots Digital control – a branch of control theory that uses digital computers to act as system controllers. Digital image processing – the use of computer algorithms to perform image processing on digital images. Dimensionality reduction – the process of reducing the number of random variables under consideration, and can be divided into feature selection and feature extraction. Distributed robotics – Electronic stability control – is a computerized technology that improves the safety of a vehicle's stability by detecting and reducing loss of traction (skidding). Evolutionary computation – Evolutionary robotics – a methodology that uses evolutionary computation to develop controllers for autonomous robots Extended Kalman filter – Flexible Distribution functions – Feedback control and regulation – Human–computer interaction – a study, planning and design of the interaction between people (users) and computers Human robot interaction – a study of interactions between humans and robots Intelligent vehicle technologies – comprise electronic, electromechanical, and electromagnetic devices - usually silicon micromachined components operating in conjunction with computer controlled devices and radio transceivers to provide precision repeatability functions (such as in robotics artificial intelligence systems) emergency warning validation performance reconstruction. Computer vision – Machine vision – Kinematics – study of motion, as applied to robots. This includes both the design of linkages to perform motion, their power, control and stability; also their planning, such as choosing a sequence of movements to achieve a broader task. Laboratory robotics – the act of using robots in biology or chemistry labs Robot learning – learning to perform tasks such as obstacle avoidance, control and various other motion-related tasks Direct manipulation interface – In computer science, direct manipulation is a human–computer interaction style which involves continuous representation of objects of interest and rapid, reversible, and incremental actions and feedback. The intention is to allow a user to directly manipulate objects presented to them, using actions that correspond at least loosely to the physical world. Manifold learning – Microrobotics – a field of miniature robotics, in particular mobile robots with characteristic dimensions less than 1 mm Motion planning – (a.k.a., the "navigation problem", the "piano mover's problem") is a term used in robotics for the process of detailing a task into discrete motions. Motor control – information processing related activities carried out by the central nervous system that organize the musculoskeletal system to create coordinated movements and skilled actions. Nanorobotics – the emerging technology field creating machines or robots whose components are at or close to the scale of a nanometer (10−9 meters). Passive dynamics – refers to the dynamical behavior of actuators, robots, or organisms when not drawing energy from a supply (e.g., batteries, fuel, ATP). Programming by Demonstration – an End-user development technique for teaching a computer or a robot new behaviors by demonstrating the task to transfer directly instead of programming it through machine commands. Quantum robotics – a subfield of robotics that deals with using quantum computers to run robotics algorithms more quickly than digital computers can. Rapid prototyping – automatic construction of physical objects via additive manufacturing from virtual models in computer aided design (CAD) software, transforming them into thin, virtual, horizontal cross-sections and then producing successive layers until the items are complete. As of June 2011, used for making models, prototype parts, and production-quality parts in relatively small numbers. Reinforcement learning – an area of machine learning in computer science, concerned with how an agent ought to take actions in an environment so as to maximize some notion of cumulative reward. Robot

    Read more →
  • Out-of-bag error

    Out-of-bag error

    Out-of-bag (OOB) error, also called out-of-bag estimate, is a method of measuring the prediction error of random forests, boosted decision trees, and other machine learning models utilizing bootstrap aggregating (bagging). Bagging uses subsampling with replacement to create training samples for the model to learn from. OOB error is the mean prediction error on each training sample xi, using only the trees that did not have xi in their bootstrap sample. Bootstrap aggregating allows one to define an out-of-bag estimate of the prediction performance improvement by evaluating predictions on those observations that were not used in the building of the next base learner. == Out-of-bag dataset == When bootstrap aggregating is performed, two independent sets are created. One set, the bootstrap sample, is the data chosen to be "in-the-bag" by sampling with replacement. The out-of-bag set is all data not chosen in the sampling process. When this process is repeated, such as when building a random forest, many bootstrap samples and OOB sets are created. The OOB sets can be aggregated into one dataset, but each sample is only considered out-of-bag for the trees that do not include it in their bootstrap sample. The picture below shows that for each bag sampled, the data is separated into two groups. This example shows how bagging could be used in the context of diagnosing disease. A set of patients are the original dataset, but each model is trained only by the patients in its bag. The patients in each out-of-bag set can be used to test their respective models. The test would consider whether the model can accurately determine if the patient has the disease. == Calculating out-of-bag error == Since each out-of-bag set is not used to train the model, it is a good test for the performance of the model. The specific calculation of OOB error depends on the implementation of the model, but a general calculation is as follows. Find all models (or trees, in the case of a random forest) that are not trained by the OOB instance. Take the majority vote of these models' result for the OOB instance, compared to the true value of the OOB instance. Compile the OOB error for all instances in the OOB dataset. The bagging process can be customized to fit the needs of a model. To ensure an accurate model, the bootstrap training sample size should be close to that of the original set. Also, the number of iterations (trees) of the model (forest) should be considered to find the true OOB error. The OOB error will stabilize over many iterations so starting with a high number of iterations is a good idea. Shown in the example to the right, the OOB error can be found using the method above once the forest is set up. == Comparison to cross-validation == Out-of-bag error and cross-validation (CV) are different methods of measuring the error estimate of a machine learning model. Over many iterations, the two methods should produce a very similar error estimate. That is, once the OOB error stabilizes, it will converge to the cross-validation (specifically leave-one-out cross-validation) error. The advantage of the OOB method is that it requires less computation and allows one to test the model as it is being trained. == Accuracy and Consistency == Out-of-bag error is used frequently for error estimation within random forests but with the conclusion of a study done by Silke Janitza and Roman Hornung, out-of-bag error has shown to overestimate in settings that include an equal number of observations from all response classes (balanced samples), small sample sizes, a large number of predictor variables, small correlation between predictors, and weak effects.

    Read more →
  • Locality-sensitive hashing

    Locality-sensitive hashing

    In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. The number of buckets is much smaller than the universe of possible input items. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items. Hashing-based approximate nearest-neighbor search algorithms generally use one of two main categories of hashing methods: either data-independent methods, such as locality-sensitive hashing (LSH); or data-dependent methods, such as locality-preserving hashing (LPH). Locality-preserving hashing was initially devised as a way to facilitate data pipelining in implementations of massively parallel algorithms that use randomized routing and universal hashing to reduce memory contention and network congestion. == Definitions == A finite family F {\displaystyle {\mathcal {F}}} of functions h : M → S {\displaystyle h\colon M\to S} is defined to be an LSH family for a metric space M = ( M , d ) {\displaystyle {\mathcal {M}}=(M,d)} , a threshold r > 0 {\displaystyle r>0} , an approximation factor c > 1 {\displaystyle c>1} , and probabilities p 1 > p 2 {\displaystyle p_{1}>p_{2}} if it satisfies the following condition. For any two points a , b ∈ M {\displaystyle a,b\in M} and a hash function h {\displaystyle h} chosen uniformly at random from F {\displaystyle {\mathcal {F}}} : If d ( a , b ) ≤ r {\displaystyle d(a,b)\leq r} , then h ( a ) = h ( b ) {\displaystyle h(a)=h(b)} (i.e., a and b collide) with probability at least p 1 {\displaystyle p_{1}} , If d ( a , b ) ≥ c r {\displaystyle d(a,b)\geq cr} , then h ( a ) = h ( b ) {\displaystyle h(a)=h(b)} with probability at most p 2 {\displaystyle p_{2}} . Such a family F {\displaystyle {\mathcal {F}}} is called ( r , c r , p 1 , p 2 ) {\displaystyle (r,cr,p_{1},p_{2})} -sensitive. === LSH with respect to a similarity measure === Alternatively it is possible to define an LSH family on a universe of items U endowed with a similarity function ϕ : U × U → [ 0 , 1 ] {\displaystyle \phi \colon U\times U\to [0,1]} . In this setting, a LSH scheme is a family of hash functions H coupled with a probability distribution D over H such that a function h ∈ H {\displaystyle h\in H} chosen according to D satisfies P r [ h ( a ) = h ( b ) ] = ϕ ( a , b ) {\displaystyle Pr[h(a)=h(b)]=\phi (a,b)} for each a , b ∈ U {\displaystyle a,b\in U} . === Amplification === Given a ( d 1 , d 2 , p 1 , p 2 ) {\displaystyle (d_{1},d_{2},p_{1},p_{2})} -sensitive family F {\displaystyle {\mathcal {F}}} , we can construct new families G {\displaystyle {\mathcal {G}}} by either the AND-construction or OR-construction of F {\displaystyle {\mathcal {F}}} . To create an AND-construction, we define a new family G {\displaystyle {\mathcal {G}}} of hash functions g, where each function g is constructed from k random functions h 1 , … , h k {\displaystyle h_{1},\ldots ,h_{k}} from F {\displaystyle {\mathcal {F}}} . We then say that for a hash function g ∈ G {\displaystyle g\in {\mathcal {G}}} , g ( x ) = g ( y ) {\displaystyle g(x)=g(y)} if and only if all h i ( x ) = h i ( y ) {\displaystyle h_{i}(x)=h_{i}(y)} for i = 1 , 2 , … , k {\displaystyle i=1,2,\ldots ,k} . Since the members of F {\displaystyle {\mathcal {F}}} are independently chosen for any g ∈ G {\displaystyle g\in {\mathcal {G}}} , G {\displaystyle {\mathcal {G}}} is a ( d 1 , d 2 , p 1 k , p 2 k ) {\displaystyle (d_{1},d_{2},p_{1}^{k},p_{2}^{k})} -sensitive family. To create an OR-construction, we define a new family G {\displaystyle {\mathcal {G}}} of hash functions g, where each function g is constructed from k random functions h 1 , … , h k {\displaystyle h_{1},\ldots ,h_{k}} from F {\displaystyle {\mathcal {F}}} . We then say that for a hash function g ∈ G {\displaystyle g\in {\mathcal {G}}} , g ( x ) = g ( y ) {\displaystyle g(x)=g(y)} if and only if h i ( x ) = h i ( y ) {\displaystyle h_{i}(x)=h_{i}(y)} for one or more values of i. Since the members of F {\displaystyle {\mathcal {F}}} are independently chosen for any g ∈ G {\displaystyle g\in {\mathcal {G}}} , G {\displaystyle {\mathcal {G}}} is a ( d 1 , d 2 , 1 − ( 1 − p 1 ) k , 1 − ( 1 − p 2 ) k ) {\displaystyle (d_{1},d_{2},1-(1-p_{1})^{k},1-(1-p_{2})^{k})} -sensitive family. == Applications == LSH has been applied to several problem domains, including: Near-duplicate detection Hierarchical clustering Genome-wide association study Image similarity identification VisualRank Gene expression similarity identification Audio similarity identification Nearest neighbor search Audio fingerprint Digital video fingerprinting Shared memory organization in parallel computing Physical data organization in database management systems Training fully connected neural networks Computer security Machine learning == Methods == === Bit sampling for Hamming distance === One of the easiest ways to construct an LSH family is by bit sampling. This approach works for the Hamming distance over d-dimensional vectors { 0 , 1 } d {\displaystyle \{0,1\}^{d}} . Here, the family F {\displaystyle {\mathcal {F}}} of hash functions is simply the family of all the projections of points on one of the d {\displaystyle d} coordinates, i.e., F = { h : { 0 , 1 } d → { 0 , 1 } ∣ h ( x ) = x i for some i ∈ { 1 , … , d } } {\displaystyle {\mathcal {F}}=\{h\colon \{0,1\}^{d}\to \{0,1\}\mid h(x)=x_{i}{\text{ for some }}i\in \{1,\ldots ,d\}\}} , where x i {\displaystyle x_{i}} is the i {\displaystyle i} th coordinate of x {\displaystyle x} . A random function h {\displaystyle h} from F {\displaystyle {\mathcal {F}}} simply selects a random bit from the input point. This family has the following parameters: P 1 = 1 − R / d {\displaystyle P_{1}=1-R/d} , P 2 = 1 − c R / d {\displaystyle P_{2}=1-cR/d} . That is, any two vectors x , y {\displaystyle x,y} with Hamming distance at most R {\displaystyle R} collide under a random h {\displaystyle h} with probability at least P 1 {\displaystyle P_{1}} . Any x , y {\displaystyle x,y} with Hamming distance at least c R {\displaystyle cR} collide with probability at most P 2 {\displaystyle P_{2}} . === Min-wise independent permutations === Suppose U is composed of subsets of some ground set of enumerable items S and the similarity function of interest is the Jaccard index J. If π is a permutation on the indices of S, for A ⊆ S {\displaystyle A\subseteq S} let h ( A ) = min a ∈ A { π ( a ) } {\displaystyle h(A)=\min _{a\in A}\{\pi (a)\}} . Each possible choice of π defines a single hash function h mapping input sets to elements of S. Define the function family H to be the set of all such functions and let D be the uniform distribution. Given two sets A , B ⊆ S {\displaystyle A,B\subseteq S} the event that h ( A ) = h ( B ) {\displaystyle h(A)=h(B)} corresponds exactly to the event that the minimizer of π over A ∪ B {\displaystyle A\cup B} lies inside A ∩ B {\displaystyle A\cap B} . As h was chosen uniformly at random, P r [ h ( A ) = h ( B ) ] = J ( A , B ) {\displaystyle Pr[h(A)=h(B)]=J(A,B)\,} and ( H , D ) {\displaystyle (H,D)\,} define an LSH scheme for the Jaccard index. Because the symmetric group on n elements has size n!, choosing a truly random permutation from the full symmetric group is infeasible for even moderately sized n. Because of this fact, there has been significant work on finding a family of permutations that is "min-wise independent" — a permutation family for which each element of the domain has equal probability of being the minimum under a randomly chosen π. It has been established that a min-wise independent family of permutations is at least of size lcm ⁡ { 1 , 2 , … , n } ≥ e n − o ( n ) {\displaystyle \operatorname {lcm} \{\,1,2,\ldots ,n\,\}\geq e^{n-o(n)}} , and that this bound is tight. Because min-wise independent families are too big for practical applications, two variant notions of min-wise independence are introduced: restricted min-wise independent permutations families, and approximate min-wise independent families. Restricted min-wise independence is the min-wise independence property restricted to certain sets of cardinality at most k. Approximate min-wise independence differs from the property by at most a fixed ε. === Open source methods === ==== Nilsimsa Hash ==== Nilsimsa is a locality-sensitive hashing algorithm used in anti-spam efforts. The goal of Nilsimsa is to generate a hash digest of an email message such that the digests of two similar messages are similar to each other. The paper suggests that the Nilsimsa satisfies three requirements: The digest identifying each message should not

    Read more →
  • Random indexing

    Random indexing

    Random indexing is a dimensionality reduction method and computational framework for distributional semantics, based on the insight that very-high-dimensional vector space model implementations are impractical, that models need not grow in dimensionality when new items (e.g. new terminology) are encountered, and that a high-dimensional model can be projected into a space of lower dimensionality without compromising L2 distance metrics if the resulting dimensions are chosen appropriately. This is the original point of the random projection approach to dimension reduction first formulated as the Johnson–Lindenstrauss lemma, and locality-sensitive hashing has some of the same starting points. Random indexing, as used in representation of language, originates from the work of Pentti Kanerva on sparse distributed memory, and can be described as an incremental formulation of a random projection. It can be also verified that random indexing is a random projection technique for the construction of Euclidean spaces—i.e. L2 normed vector spaces. In Euclidean spaces, random projections are elucidated using the Johnson–Lindenstrauss lemma. The TopSig technique extends the random indexing model to produce bit vectors for comparison with the Hamming distance similarity function. It is used for improving the performance of information retrieval and document clustering. In a similar line of research, Random Manhattan Integer Indexing (RMII) is proposed for improving the performance of the methods that employ the Manhattan distance between text units. Many random indexing methods primarily generate similarity from co-occurrence of items in a corpus. Reflexive Random Indexing (RRI) generates similarity from co-occurrence and from shared occurrence with other items.

    Read more →
  • Neurorobotics

    Neurorobotics

    Neurorobotics is the combined study of neuroscience, robotics, and artificial intelligence. It is the science and technology of embodied autonomous neural systems. Neural systems include brain-inspired algorithms (e.g. connectionist networks), computational models of biological neural networks (e.g. artificial spiking neural networks, large-scale simulations of neural microcircuits) and actual biological systems (e.g. in vivo and in vitro neural nets). Such neural systems can be embodied in machines with mechanic or any other forms of physical actuation. This includes robots, prosthetic or wearable systems but also, at smaller scale, micro-machines and, at the larger scales, furniture and infrastructures. Neurorobotics is that branch of neuroscience with robotics, which deals with the study and application of science and technology of embodied autonomous neural systems like brain-inspired algorithms. It is based on the idea that the brain is embodied and the body is embedded in the environment. Therefore, most neurorobots are required to function in the real world, as opposed to a simulated environment. Beyond brain-inspired algorithms for robots neurorobotics may also involve the design of brain-controlled robot systems. == Major classes of models == Neurorobots can be divided into various major classes based on the robot's purpose. Each class is designed to implement a specific mechanism of interest for study. Common types of neurorobots are those used to study motor control, memory, action selection, and perception. === Locomotion and motor control === Neurorobots are often used to study motor feedback and control systems, and have proved their merit in developing controllers for robots. Locomotion is modeled by a number of neurologically inspired theories on the action of motor systems. Locomotion control has been mimicked using models or central pattern generators, clumps of neurons capable of driving repetitive behavior, to make four-legged walking robots. Other groups have expanded the idea of combining rudimentary control systems into a hierarchical set of simple autonomous systems. These systems can formulate complex movements from a combination of these rudimentary subsets. This theory of motor action is based on the organization of cortical columns, which progressively integrate from simple sensory input into a complex afferent signals, or from complex motor programs to simple controls for each muscle fiber in efferent signals, forming a similar hierarchical structure. Another method for motor control uses learned error correction and predictive controls to form a sort of simulated muscle memory. In this model, awkward, random, and error-prone movements are corrected for using error feedback to produce smooth and accurate movements over time. The controller learns to create the correct control signal by predicting the error. Using these ideas, robots have been designed which can learn to produce adaptive arm movements or to avoid obstacles in a course. === Learning and memory systems === Robots designed to test theories of animal memory systems. Many studies examine the memory system of rats, particularly the rat hippocampus, dealing with place cells, which fire for a specific location that has been learned. Systems modeled after the rat hippocampus are generally able to learn mental maps of the environment, including recognizing landmarks and associating behaviors with them, allowing them to predict the upcoming obstacles and landmarks. Another study has produced a robot based on the proposed learning paradigm of barn owls for orientation and localization based on primarily auditory, but also visual stimuli. The hypothesized method involves synaptic plasticity and neuromodulation, a mostly chemical effect in which reward neurotransmitters such as dopamine or serotonin affect the firing sensitivity of a neuron to be sharper. The robot used in the study adequately matched the behavior of barn owls. Furthermore, the close interaction between motor output and auditory feedback proved to be vital in the learning process, supporting active sensing theories that are involved in many of the learning models. Neurorobots in these studies are presented with simple mazes or patterns to learn. Some of the problems presented to the neurorobot include recognition of symbols, colors, or other patterns and execute simple actions based on the pattern. In the case of the barn owl simulation, the robot had to determine its location and direction to navigate in its environment. === Action selection and value systems === Action selection studies deal with negative or positive weighting to an action and its outcome. Neurorobots can and have been used to study simple ethical interactions, such as the classical thought experiment where there are more people than a life raft can hold, and someone must leave the boat to save the rest. However, more neurorobots used in the study of action selection contend with much simpler persuasions such as self-preservation or perpetuation of the population of robots in the study. These neurorobots are modeled after the neuromodulation of synapses to encourage circuits with positive results. In biological systems, neurotransmitters such as dopamine or acetylcholine positively reinforce neural signals that are beneficial. One study of such interaction involved the robot Darwin VII, which used visual, auditory, and a simulated taste input to "eat" conductive metal blocks. The arbitrarily chosen good blocks had a striped pattern on them while the bad blocks had a circular shape on them. The taste sense was simulated by conductivity of the blocks. The robot had positive and negative feedbacks to the taste based on its level of conductivity. The researchers observed the robot to see how it learned its action selection behaviors based on the inputs it had. Other studies have used herds of small robots which feed on batteries strewn about the room, and communicate its findings to other robots. === Sensory perception === Neurorobots have also been used to study sensory perception, particularly vision. These are primarily systems that result from embedding neural models of sensory pathways in automatas. This approach gives exposure to the sensory signals that occur during behavior and also enables a more realistic assessment of the degree of robustness of the neural model. It is well known that changes in the sensory signals produced by motor activity provide useful perceptual cues that are used extensively by organisms. For example, researchers have used the depth information that emerges during replication of human head and eye movements to establish robust representations of the visual scene. == Biological robots == Biological robots are not officially neurorobots in that they are not neurologically inspired AI systems, but actual neuron tissue wired to a robot. This employs the use of cultured neural networks to study brain development or neural interactions. These typically consist of a neural culture raised on a multielectrode array (MEA), which is capable of both recording the neural activity and stimulating the tissue. In some cases, the MEA is connected to a computer which presents a simulated environment to the brain tissue and translates brain activity into actions in the simulation, as well as providing sensory feedback The ability to record neural activity gives researchers a window into a brain, which they can use to learn about a number of the same issues neurorobots are used for. An area of concern with the biological robots is ethics. Many questions are raised about how to treat such experiments. The central question concerns consciousness and whether or not the rat brain experiences it. There are many theories about how to define consciousness. == Implications for neuroscience == Neuroscientists benefit from neurorobotics because it provides a blank slate to test various possible methods of brain function in a controlled and testable environment. While robots are more simplified versions of the systems they emulate, they are more specific, allowing more direct testing of the issue at hand. They also have the benefit of being accessible at all times, while it is more difficult to monitor large portions of a brain while the human or animal is active, especially individual neurons. The development of neuroscience has produced neural treatments. These include pharmaceuticals and neural rehabilitation. Progress is dependent on an intricate understanding of the brain and how exactly it functions. It is difficult to study the brain, especially in humans, due to the danger associated with cranial surgeries. Neurorobots can improved the range of tests and experiments that can be performed in the study of neural processes.

    Read more →
  • Random neural network

    Random neural network

    The Random Neural Network (RNN) is a mathematical representation of an interconnected network of neurons or cells which exchange spiking signals. It was invented by Erol Gelenbe and is linked to the G-network model of queueing networks which Erol Gelenbe also invented, and with his Gene Regulatory Network models. In this model, each neuronal cell state is represented by an integer whose value rises when the cell receives an excitatory spike and drops when it receives an inhibitory spike. The spikes can originate outside the network itself, or they can come from other cells in the networks. Cells whose internal excitatory state has a positive value are allowed to send out spikes of either kind to other cells in the network according to specific cell-dependent spiking rates. The model has a mathematical solution in steady-state which provides the joint probability distribution of the network in terms of the individual probabilities that each cell is excited and able to send out spikes. Computing this solution is based on solving a set of non-linear algebraic equations whose parameters are related to the spiking rates of individual cells and their connectivity to other cells, as well as the arrival rates of spikes from outside the network. The RNN is a recurrent model, i.e. a neural network that is allowed to have complex feedback loops. A highly energy-efficient implementation of random neural networks was demonstrated by Krishna Palem et al. using the Probabilistic CMOS or PCMOS technology and was shown to be c. 226–300 times more efficient in terms of Energy-Performance-Product. RNNs are also related to artificial neural networks, which (like the random neural network) have gradient-based learning algorithms. The learning algorithm for an n-node random neural network that includes feedback loops (it is also a recurrent neural network) is of computational complexity O(n^3) (the number of computations is proportional to the cube of n, the number of neurons). The random neural network can also be used with other learning algorithms such as reinforcement learning. The RNN has been shown to be a universal approximator for bounded and continuous functions.

    Read more →
  • Probit model

    Probit model

    In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from probability + unit. The purpose of the model is to estimate the probability that an observation with particular characteristics will fall into a specific one of the categories; moreover, classifying observations based on their predicted probabilities is a type of binary classification model. A probit model is a popular specification for a binary response model. As such it treats the same set of problems as does logistic regression using similar techniques. When viewed in the generalized linear model framework, the probit model employs a probit link function. It is most often estimated using the maximum likelihood procedure, such an estimation being called a probit regression. == Conceptual framework == Suppose a response variable Y is binary, that is it can have only two possible outcomes which we will denote as 1 and 0. For example, Y may represent presence/absence of a certain condition, success/failure of some device, answer yes/no on a survey, etc. We also have a vector of regressors X, which are assumed to influence the outcome Y. Specifically, we assume that the model takes the form P ( Y = 1 ∣ X ) = Φ ( X T β ) , {\displaystyle P(Y=1\mid X)=\Phi (X^{\operatorname {T} }\beta ),} where P is the probability and Φ {\displaystyle \Phi } is the cumulative distribution function (CDF) of the standard normal distribution. The parameters β are typically estimated by maximum likelihood. It is possible to motivate the probit model as a latent variable model. Suppose there exists an auxiliary random variable Y ∗ = X T β + ε , {\displaystyle Y^{\ast }=X^{T}\beta +\varepsilon ,} where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is positive: Y = { 1 Y ∗ > 0 0 otherwise } = { 1 X T β + ε > 0 0 otherwise } {\displaystyle Y=\left.{\begin{cases}1&Y^{}>0\\0&{\text{otherwise}}\end{cases}}\right\}=\left.{\begin{cases}1&X^{\operatorname {T} }\beta +\varepsilon >0\\0&{\text{otherwise}}\end{cases}}\right\}} The use of the standard normal distribution causes no loss of generality compared with the use of a normal distribution with an arbitrary mean and standard deviation, because adding a fixed amount to the mean can be compensated by subtracting the same amount from the intercept, and multiplying the standard deviation by a fixed amount can be compensated by multiplying the weights by the same amount. To see that the two models are equivalent, note that P ( Y = 1 ∣ X ) = P ( Y ∗ > 0 ) = P ( X T β + ε > 0 ) = P ( ε > − X T β ) = P ( ε < X T β ) by symmetry of the normal distribution = Φ ( X T β ) {\displaystyle {\begin{aligned}P(Y=1\mid X)&=P(Y^{\ast }>0)\\&=P(X^{\operatorname {T} }\beta +\varepsilon >0)\\&=P(\varepsilon >-X^{\operatorname {T} }\beta )\\&=P(\varepsilon 0 {\displaystyle t,\lim _{n\rightarrow \infty }n_{t}/n=c_{t}>0} . Denote p ^ t = r t / n t {\displaystyle {\hat {p}}_{t}=r_{t}/n_{t}} σ ^ t 2 = 1 n t p ^ t ( 1 − p ^ t ) φ 2 ( Φ − 1 ( p ^ t ) ) {\displaystyle {\hat {\sigma }}_{t}^{2}={\frac {1}{n_{t}}}{\frac {{\hat {p}}_{t}(1-{\hat {p}}_{t})}{\varphi ^{2}{\big (}\Phi ^{-1}({\hat {p}}_{t}){\big )}}}} Then Berkson's minimum chi-square estimator is a generalized least squares estimator in a regression of Φ − 1 ( p ^ t ) {\displaystyle \Phi ^{-1}({\hat {p}}_{t})} on x ( t ) {\displaystyle x_{(t)}} with weights σ ^ t − 2 {\displaystyle {\hat {\sigma }}_{t}^{-2}} : β ^ = ( ∑ t = 1 T σ ^ t − 2 x ( t ) x ( t ) T ) − 1 ∑ t = 1 T σ ^ t − 2 x ( t ) Φ − 1 ( p ^ t ) {\displaystyle {\hat {\beta }}={\Bigg (}\sum _{t=1}^{T}{\hat {\sigma }}_{t}^{-2}x_{(t)}x_{(t)}^{\operatorname {T} }{\Bigg )}^{-1}\sum _{t=1}^{T}{\hat {\sigma }}_{t}^{-2}x_{(t)}\Phi ^{-1}({\hat {p}}_{t})} It can be shown that this estimator is consistent (as n→∞ and T fixed), asymptotically normal and efficient. Its advantage is the presence of a closed-form formula for the estimator. However, it is only meaningful to carry out this analysis when individual observations are not available, only their aggregated counts r t {\displaystyle r_{t}} , n t {\disp

    Read more →
  • Persian Speech Corpus

    Persian Speech Corpus

    The Persian Speech Corpus is a Modern Persian speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of about 2.5 hours of Persian speech aligned with recorded speech on the phoneme level, including annotations of word boundaries. Previous spoken corpora of Persian include FARSDAT, which consists of read aloud speech from newspaper texts from 100 Persian speakers and the Telephone FARsi Spoken language DATabase (TFARSDAT) which comprises seven hours of read and spontaneous speech produced by 60 native speakers of Persian from ten regions of Iran. The Persian Speech Corpus was built using the same methodologies laid out in the doctoral project on Modern Standard Arabic of Nawar Halabi at the University of Southampton. The work was funded by MicroLinkPC, who own an exclusive license to commercialise the corpus, though the corpus is available for non-commercial use through the corpus' website. It is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The corpus was built for speech synthesis purposes, but has been used for building HMM based voices in Persian. It can also be used to automatically align other speech corpora with their phonetic transcript and could be used as part of a larger corpus for training speech recognition systems. == Contents == The corpus is downloadable from its website, and contains the following: 396 .wav files containing spoken utterances 396 .lab files containing text utterances 396 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line

    Read more →
  • Box blur

    Box blur

    A box blur (also known as a box linear filter) is a spatial domain linear filter in which each pixel in the resulting image has a value equal to the average value of its neighboring pixels in the input image. It is a form of low-pass ("blurring") filter. A 3 by 3 box blur ("radius 1") can be written as matrix 1 9 [ 1 1 1 1 1 1 1 1 1 ] . {\displaystyle {\frac {1}{9}}{\begin{bmatrix}1&1&1\\1&1&1\\1&1&1\end{bmatrix}}.} Due to its property of using equal weights, it can be implemented using a much simpler accumulation algorithm, which is significantly faster than using a sliding-window algorithm. Box blurs are frequently used to approximate a Gaussian blur. By the central limit theorem, repeated application of a box blur will approximate a Gaussian blur. In the frequency domain, a box blur has zeros and negative components. That is, a sine wave with a period equal to the size of the box will be blurred away entirely, and wavelengths shorter than the size of the box may be phase-reversed, as seen when two bokeh circles touch to form a bright spot where there would be a dark spot between two bright spots in the original image. == Extensions == Gwosdek, et al. has extended Box blur to take a fractional radius: the edges of the 1-D filter are expanded with a fraction. It makes slightly better gaussian approximation possible due to the elimination of integer-rounding error. Mario Klingemann has a "stack blur" that tries to better emulate gaussian's look in one pass by stacking weights: 1 9 [ 1 2 3 2 1 ] {\displaystyle {\frac {1}{9}}{\begin{bmatrix}1&2&3&2&1\end{bmatrix}}} The triangular impulse response it forms decomposes to two rounds of box blur. Stacked Integral Image by Bhatia et al. takes the weighted average of a few box blurs to fit the gaussian response curve. == Implementation == The following pseudocode implements a 3x3 box blur. The example does not handle the edges of the image, which would not fit inside the kernel, so that these areas remain unblurred. In practice, the issue is better handled by: Introducing an alpha channel to represent the absence of colors; Extending the boundary by filling in values, ranked by quality: Fill in a mirrored image at the border Fill in a constant color extending from the last pixel Pad in a fixed color A number of optimizations can be applied when implementing the box blur of a radius r and N pixels: The box blur is a separable filter, so that only two 1D passes of averaging 2 r + 1 pixels will be needed, one horizontal and one vertical, for each pixel. This lowers the complexity from O(Nr2) to O(Nr). In digital signal processing terminology, each pass is a moving-average filter. Accumulation. Instead of discarding the sum for each pixel, the algorithm re-uses the previous sum, and updates it by subtracting away the old pixel and adding the new pixel in the blurring range. A summed-area table can be used similarly. This lowers the complexity from O(Nr) to O(N). When being used in multiple passes to approximate a Gaussian blur, the cascaded integrator–comb filter construction allows for doing the equivalent operation in a single pass.

    Read more →
  • Fitness function

    Fitness function

    A fitness function is a particular type of objective or cost function that is used to summarize, as a single figure of merit, how close a given candidate solution is to achieving the set aims. It is an important component of evolutionary algorithms (EA), such as genetic programming, evolution strategies or genetic algorithms. An EA is a metaheuristic that reproduces the basic principles of biological evolution as a computer algorithm in order to solve challenging optimization or planning tasks, at least approximately. For this purpose, many candidate solutions are generated, which are evaluated using a fitness function in order to guide the evolutionary development towards the desired goal. Similar quality functions are also used in other metaheuristics, such as ant colony optimization or particle swarm optimization. In the field of EAs, each candidate solution, also called an individual, is commonly represented as a string of numbers (referred to as a chromosome). After each round of testing or simulation the idea is to delete the n worst individuals, and to breed n new ones from the best solutions. Each individual must therefore to be assigned a quality number indicating how close it has come to the overall specification, and this is generated by applying the fitness function to the test or simulation results obtained from that candidate solution. Two main classes of fitness functions exist: one where the fitness function does not change, as in optimizing a fixed function or testing with a fixed set of test cases; and one where the fitness function is mutable, as in niche differentiation or co-evolving the set of test cases. Another way of looking at fitness functions is in terms of a fitness landscape, which shows the fitness for each possible chromosome. In the following, it is assumed that the fitness is determined based on an evaluation that remains unchanged during an optimization run. A fitness function does not necessarily have to be able to calculate an absolute value, as it is sometimes sufficient to compare candidates in order to select the better one. A relative indication of fitness (candidate a is better than b) is sufficient in some cases, such as tournament selection or Pareto optimization. == Requirements of evaluation and fitness function == The quality of the evaluation and calculation of a fitness function is fundamental to the success of an EA optimisation. It implements Darwin's principle of "survival of the fittest". Without fitness-based selection mechanisms for mate selection and offspring acceptance, EA search would be blind and hardly distinguishable from the Monte Carlo method. When setting up a fitness function, one must always be aware that it is about more than just describing the desired target state. Rather, the evolutionary search on the way to the optimum should also be supported as much as possible (see also section on auxiliary objectives), if and insofar as this is not already done by the fitness function alone. If the fitness function is designed badly, the algorithm will either converge on an inappropriate solution, or will have difficulty converging at all. Definition of the fitness function is not straightforward in many cases and often is performed iteratively if the fittest solutions produced by an EA is not what is desired. Interactive genetic algorithms address this difficulty by outsourcing evaluation to external agents which are normally humans. == Computational efficiency == The fitness function should not only closely align with the designer's goal, but also be computationally efficient. Execution speed is crucial, as a typical evolutionary algorithm must be iterated many times in order to produce a usable result for a non-trivial problem. Fitness approximation may be appropriate, especially in the following cases: Fitness computation time of a single solution is extremely high Precise model for fitness computation is missing The fitness function is uncertain or noisy. Alternatively or also in addition to the fitness approximation, the fitness calculations can also be distributed to a parallel computer in order to reduce the execution times. Depending on the population model of the EA used, both the EA itself and the fitness calculations of all offspring of one generation can be executed in parallel. == Multi-objective optimization == Practical applications usually aim at optimizing multiple and at least partially conflicting objectives. Two fundamentally different approaches are often used for this purpose, Pareto optimization and optimization based on fitness calculated using the weighted sum. === Weighted sum and penalty functions === When optimizing with the weighted sum, the single values of the O {\displaystyle O} objectives are first normalized so that they can be compared. This can be done with the help of costs or by specifying target values and determining the current value as the degree of fulfillment. Costs or degrees of fulfillment can then be compared with each other and, if required, can also be mapped to a uniform fitness scale. Without loss of generality, fitness is assumed to represent a value to be maximized. Each objective o i {\displaystyle o_{i}} is assigned a weight w i {\displaystyle w_{i}} in the form of a percentage value so that the overall raw fitness f r a w {\displaystyle f_{raw}} can be calculated as a weighted sum: f r a w = ∑ i = 1 O o i ⋅ w i w i t h ∑ i = 1 O w i = 1 {\displaystyle f_{raw}=\sum _{i=1}^{O}{o_{i}\cdot w_{i}}\quad {\mathsf {with}}\quad \sum _{i=1}^{O}{w_{i}}=1} A violation of R {\displaystyle R} restrictions r j {\displaystyle r_{j}} can be included in the fitness determined in this way in the form of penalty functions. For this purpose, a function p f j ( r j ) {\displaystyle pf_{j}(r_{j})} can be defined for each restriction which returns a value between 0 {\displaystyle 0} and 1 {\displaystyle 1} depending on the degree of violation, with the result being 1 {\displaystyle 1} if there is no violation. The previously determined raw fitness is multiplied by the penalty function(s) and the result is then the final fitness f f i n a l {\displaystyle f_{final}} : f f i n a l = f r a w ⋅ ∏ j = 1 R p f j ( r j ) = ∑ i = 1 O ( o i ⋅ w i ) ⋅ ∏ j = 1 R p f j ( r j ) {\displaystyle f_{final}=f_{raw}\cdot \prod _{j=1}^{R}{pf_{j}(r_{j})}=\sum _{i=1}^{O}{(o_{i}\cdot w_{i})}\cdot \prod _{j=1}^{R}{pf_{j}(r_{j})}} This approach is simple and has the advantage of being able to combine any number of objectives and restrictions. The disadvantage is that different objectives can compensate each other and that the weights have to be defined before the optimization. This means that the compromise lines must be defined before optimization, which is why optimization with the weighted sum is also referred to as the a priori method. In addition, certain solutions may not be obtained, see the section on the comparison of both types of optimization. === Pareto optimization === A solution is called Pareto-optimal if the improvement of one objective is only possible with a deterioration of at least one other objective. The set of all Pareto-optimal solutions, also called Pareto set, represents the set of all optimal compromises between the objectives. The figure below on the right shows an example of the Pareto set of two objectives f 1 {\displaystyle f_{1}} and f 2 {\displaystyle f_{2}} to be maximized. The elements of the set form the Pareto front (green line). From this set, a human decision maker must subsequently select the desired compromise solution. Constraints are included in Pareto optimization in that solutions without constraint violations are per se better than those with violations. If two solutions to be compared each have constraint violations, the respective extent of the violations decides. It was recognized early on that EAs with their simultaneously considered solution set are well suited to finding solutions in one run that cover the Pareto front sufficiently well. They are therefore well suited as a-posteriori methods for multi-objective optimization, in which the final decision is made by a human decision maker after optimization and determination of the Pareto front. Besides the SPEA2, the NSGA-II and NSGA-III have established themselves as standard methods. The advantage of Pareto optimization is that, in contrast to the weighted sum, it provides all alternatives that are equivalent in terms of the objectives as an overall solution. The disadvantage is that a visualization of the alternatives becomes problematic or even impossible from four objectives on. Furthermore, the effort increases exponentially with the number of objectives. If there are more than three or four objectives, some have to be combined using the weighted sum or other aggregation methods. === Comparison of both types of assessment === With the help of the weighted sum, the total Pareto front can be obtained by a suitable choice of weights, provided that it is convex

    Read more →
  • Multilinear principal component analysis

    Multilinear principal component analysis

    Multilinear principal component analysis (MPCA) is a multilinear extension of principal component analysis (PCA) that is used to analyze M-way arrays, also informally referred to as "data tensors". M-way arrays may be modeled by linear tensor models, such as CANDECOMP/Parafac, or by multilinear tensor models, such as multilinear principal component analysis (MPCA) or multilinear (tensor) independent component analysis (MICA). In 2005, Vasilescu and Terzopoulos introduced the Multilinear PCA terminology as a way to better differentiate between multilinear data models that employed 2nd order statistics versus higher order statistics to compute a set of independent components for each mode, such as Multilinear ICA Multilinear PCA may be applied to compute the causal factors of data formation, or as signal processing tool on data tensors whose individual observation have either been vectorized, or whose observations are treated as a collection of column/row observations, an "observation as a matrix", and concatenated into a data tensor. The latter approach is suitable for compression and reducing redundancy in the rows, columns and fibers that are unrelated to the causal factors of data formation. Vasilescu and Terzopoulos in their paper "TensorFaces" introduced the M-mode SVD algorithm which are algorithms misidentified in the literature as the HOSVD or the Tucker which employ the power method or gradient descent, respectively. Vasilescu and Terzopoulos framed the data analysis, recognition and synthesis problems as multilinear tensor problems. Data is viewed as the compositional consequence of several causal factors, that are well suited for multi-modal tensor factor analysis. The power of the tensor framework was showcased by analyzing human motion joint angles, facial images or textures in the following papers: Human Motion Signatures (CVPR 2001, ICPR 2002), face recognition – TensorFaces, (ECCV 2002, CVPR 2003, etc.) and computer graphics – TensorTextures (Siggraph 2004). == The algorithm == The MPCA solution follows the alternating least square (ALS) approach. It is iterative in nature. As in PCA, MPCA works on centered data. Centering is a little more complicated for tensors, and it is problem dependent. == Feature selection == MPCA features: Supervised MPCA is employed in causal factor analysis that facilitates object recognition while a semi-supervised MPCA feature selection is employed in visualization tasks. == Extensions == Various extension of MPCA: Robust MPCA (RMPCA) Multi-Tensor Factorization, that also finds the number of components automatically (MTF)

    Read more →
  • GraphLab

    GraphLab

    Turi is a graph-based, high performance, distributed computation framework written in C++. The GraphLab project was started by Prof. Carlos Guestrin of Carnegie Mellon University in 2009. It is an open source project that uses the Apache License. While GraphLab was originally developed for machine learning tasks, it has also been developed for other data-mining tasks. == Motivation == As the amounts of collected data and computing power grow (multicore, GPUs, clusters, clouds), modern datasets no longer fit into one computing node. Efficient distributed parallel algorithms for handling large-scale data are required. The GraphLab framework is a parallel programming abstraction targeted for sparse iterative graph algorithms. GraphLab provides a programming interface, allowing deployment of distributed machine learning algorithms. The main design considerations behind the design of GraphLab are: Sparse data with local dependencies Iterative algorithms Potentially asynchronous execution == GraphLab toolkits == On top of GraphLab, several implemented libraries of algorithms: Topic modeling - contains applications like LDA, which can be used to cluster documents and extract topical representations. Graph analytics - contains applications like pagerank and triangle counting, which can be applied to general graphs to estimate community structure. Clustering - contains standard data clustering tools such as Kmeans Collaborative filtering - contains a collection of applications used to make predictions about users interests and factorize large matrices. Graphical models - contains tools for making joint predictions about collections of related random variables. Computer vision - contains a collection of tools for reasoning about images. == Turi == Turi (formerly called Dato and before that GraphLab Inc.) is a company that was founded by Prof. Carlos Guestrin from University of Washington in May 2013 to continue development support of the GraphLab open source project. Dato Inc. raised a $6.75M Series A from Madrona Venture Group and New Enterprise Associates (NEA). They raised a $18.5M Series B from Vulcan Capital and Opus Capital, with participation from Madrona and NEA. On August 5, 2016, Turi was acquired by Apple Inc. for $200,000,000.

    Read more →
  • AI alignment

    AI alignment

    In the field of artificial intelligence (AI), alignment aims to steer AI systems toward a person's or group's intended goals, preferences, or ethical principles. An AI system is considered aligned if it advances the intended objectives. A misaligned AI system pursues unintended objectives. It is often difficult for AI designers to specify the full range of desired and undesired behaviors. Therefore, the designers often use simpler proxy goals, such as gaining human approval. But proxy goals can overlook necessary constraints or reward the AI system for merely appearing aligned. AI systems may also find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful, ways (reward hacking). Advanced AI systems may develop unwanted instrumental strategies, such as seeking power or self-preservation because such strategies help them achieve their assigned final goals. Furthermore, they might develop undesirable emergent goals that could be hard to detect before the system is deployed and encounters new situations and data distributions. Empirical research showed in 2024 that advanced large language models (LLMs) such as OpenAI o1 or Claude 3 sometimes engage in strategic deception to achieve their goals or prevent them from being changed. Some of these issues affect existing commercial systems such as LLMs, robots, autonomous vehicles, and social media recommendation engines. Some AI researchers argue that more capable future systems will be more severely affected because these problems partially result from high capabilities. Many prominent AI researchers and AI company leaders have argued or asserted that AI is approaching human-like (AGI) and superhuman cognitive capabilities (ASI), and could endanger human civilization if misaligned. These include "AI godfathers" Geoffrey Hinton and Yoshua Bengio and the CEOs of OpenAI, Anthropic, and Google DeepMind. These risks remain debated. AI alignment is a subfield of AI safety, the study of how to build safe AI systems. Other subfields of AI safety include robustness, monitoring, and capability control. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking. Alignment research has connections to interpretability research, (adversarial) robustness, anomaly detection, calibrated uncertainty, formal verification, preference learning, safety-critical engineering, game theory, algorithmic fairness, and social sciences. == Objectives in AI == Programmers provide an AI system such as AlphaZero with an "objective function", in which they intend to encapsulate the goal(s) the AI is configured to accomplish. Such a system later populates a (possibly implicit) internal "model" of its environment. This model encapsulates all the agent's beliefs about the world. The AI then creates and executes whatever plan is calculated to maximize the value of its objective function. For example, when AlphaZero is trained on chess, it has a simple objective function of "+1 if AlphaZero wins, −1 if AlphaZero loses". During the game, AlphaZero attempts to execute whatever sequence of moves it judges most likely to attain the maximum value of +1. Similarly, a reinforcement learning system can have a "reward function" that allows the programmers to shape the AI's desired behavior. An evolutionary algorithm's behavior is shaped by a "fitness function". == Alignment problem == In 1960, AI pioneer Norbert Wiener described the AI alignment problem as follows: If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively [...] we had better be quite sure that the purpose put into the machine is the purpose which we really desire. AI alignment refers to ensuring that an AI system's objectives match some target. The target is variously defined as the goals of the system's designers or users, widely shared values, objective ethical standards, legal requirements, or the intentions its designers would have if they were more informed and enlightened. In democratic AI alignment, the target is the values and preferences of median voters, which increases political legitimacy. AI alignment is an open problem for modern AI systems and is a research field within AI. Aligning AI involves two main challenges: carefully specifying the purpose of the system (outer alignment) and ensuring that the system adopts the specification robustly (inner alignment). Researchers also attempt to create AI models that have robust alignment, sticking to safety constraints even when users adversarially try to bypass them. === Specification gaming and side effects === To specify an AI system's purpose, AI designers typically provide an objective function, examples, or feedback to the system. But designers are often unable to completely specify all important values and constraints, so they resort to easy-to-specify proxy goals such as maximizing the approval of human overseers, who are fallible. As a result, AI systems can find loopholes that help them accomplish the specified objective efficiently but in unintended, possibly harmful ways. This tendency is known as specification gaming or reward hacking, and is an instance of Goodhart's law. As AI systems become more capable, they are often able to game their specifications more effectively. Specification gaming has been observed in numerous AI systems. OpenAI GPT models for programming—including in real-world cases—have been found to explicitly plan hacking the tests used to evaluate them to falsely appear successful (e.g., explicitly stating "let's hack"). When the company penalized this, many models learned to obfuscate their plans while continuing to hack the tests. Another system was trained to finish a simulated boat race by rewarding the system for hitting targets along the track, but the system achieved more reward by looping and crashing into the same targets indefinitely. A 2025 Palisade Research study found that when tasked to win at chess against a stronger opponent, some reasoning LLMs attempted to hack the game system, for example by modifying or entirely deleting their opponent. Some alignment researchers aim to help humans detect specification gaming and steer AI systems toward carefully specified objectives that are safe and useful to pursue. When a misaligned AI system is deployed, it can have consequential side effects. Social media platforms have been known to optimize their recommendation algorithms for click-through rates, causing user addiction on a global scale. Stanford researchers say that such recommender systems are misaligned with their users because they "optimize simple engagement metrics rather than a harder-to-measure combination of societal and consumer well-being". Explaining such side effects, Berkeley computer scientist Stuart J. Russell said that the omission of implicit constraints can cause harm: "A system [...] will often set [...] unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer's apprentice, or King Midas: you get exactly what you ask for, not what you want." Some researchers suggest that AI designers specify their desired goals by listing forbidden actions or by formalizing ethical rules (as with Asimov's Three Laws of Robotics). But Russell and Norvig argue that this approach overlooks the complexity of human values: "It is certainly very hard, and perhaps impossible, for mere humans to anticipate and rule out in advance all the disastrous ways the machine could choose to achieve a specified objective." Additionally, even if an AI system fully understands human intentions, it may still disregard them, because following human intentions may not be its objective (unless it is already fully aligned). === Pressure to deploy unsafe systems === Commercial organizations sometimes have incentives to take shortcuts on safety and to deploy misaligned or unsafe AI systems. For example, social media recommender systems have been profitable despite creating unwanted addiction and polarization. Competitive pressure can also lead to a race to the bottom on AI safety standards. For example, OpenAI has been sued for releasing a ChatGPT version that encouraged suicide for some unstable users, a behavior the company had overlooked amid a rushed product release. Similarly, in 2018, a self-driving car killed a pedestrian (Elaine Herzberg) after engineers disabled the emergency braking system because it was oversensitive and slowed development. === Risks from advanced misaligned AI === Some researchers are interested in aligning increasingly advanced AI systems, as progress in AI development is rapid, and industry and governments are trying to build advan

    Read more →
  • Charge based boundary element fast multipole method

    Charge based boundary element fast multipole method

    The charge-based formulation of the boundary element method (BEM) is a dimensionality reduction numerical technique that is used to model quasistatic electromagnetic phenomena in highly complex conducting media (targeting, e.g., the human brain) with a very large (up to approximately 1 billion) number of unknowns. The charge-based BEM solves an integral equation of the potential theory written in terms of the induced surface charge density. This formulation is naturally combined with fast multipole method (FMM) acceleration, and the entire method is known as charge-based BEM-FMM. The combination of BEM and FMM is a common technique in different areas of computational electromagnetics and, in the context of bioelectromagnetism, it provides improvements over the finite element method. == Historical development == Along with more common electric potential-based BEM, the quasistatic charge-based BEM, derived in terms of the single-layer (charge) density, for a single-compartment medium has been known in the potential theory since the beginning of the 20th century. For multi-compartment conducting media, the surface charge density formulation first appeared in discretized form (for faceted interfaces) in the 1964 paper by Gelernter and Swihart. A subsequent continuous form, including time-dependent and dielectric effects, appeared in the 1967 paper by Barnard, Duck, and Lynn. The charge-based BEM has also been formulated for conducting, dielectric, and magnetic media, and used in different applications. In 2009, Greengard et al. successfully applied the charge-based BEM with fast multipole acceleration to molecular electrostatics of dielectrics. A similar approach to realistic modeling of the human brain with multiple conducting compartments was first described by Makarov et al. in 2018. Along with this, the BEM-based multilevel fast multipole method has been widely used in radar and antenna studies at microwave frequencies as well as in acoustics. == Physical background - surface charges in biological media == The charge-based BEM is based on the concept of an impressed (or primary) electric field E i {\displaystyle \mathbf {E} ^{i}} and a secondary electric field E s {\displaystyle \mathbf {E} ^{s}} . The impressed field is usually known a priori or is trivial to find. For the human brain, the impressed electric field can be classified as one of the following: A conservative field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed density of EEG or MEG current sources in a homogeneous infinite medium with the conductivity σ {\displaystyle \sigma } at the source location; An instantaneous solenoidal field E i {\displaystyle \mathbf {E} ^{i}} of an induction coil obtained from Faraday's law of induction in a homogeneous infinite medium (air), when transcranial magnetic stimulation (TMS) problems are concerned; A surface field E i {\displaystyle \mathbf {E} ^{i}} derived from an impressed surface current density J i = σ E i {\displaystyle \mathbf {J} ^{i}=\sigma \mathbf {E} ^{i}} of current electrodes injecting electric current at a boundary of a compartment with conductivity σ {\displaystyle \sigma } when transcranial direct-current stimulation (tDCS) or deep brain stimulation (DBS) are concerned; A conservative field E i {\displaystyle \mathbf {E} ^{i}} of charges deposited on voltage electrodes for tDCS or DBS. This specific problem requires a coupled treatment since these charges will depend on the environment; In application to multiscale modeling, a field E i {\displaystyle \mathbf {E} ^{i}} obtained from any other macroscopic numerical solution in a small (mesoscale or microscale) spatial domain within the brain. For example, a constant field can be used. When the impressed field is "turned on", free charges located within a conducting volume D immediately begin to redistribute and accumulate at the boundaries (interfaces) of regions of different conductivity in D. A surface charge density ρ ( r ) {\displaystyle \rho (\mathbf {r} )} appears on the conductivity interfaces. This charge density induces a secondary conservative electric field E s {\displaystyle \mathbf {E} ^{s}} following Coulomb's law. One example is a human under a direct current powerline with the known field E i {\displaystyle \mathbf {E} ^{i}} directed down. The superior surface of the human's conducting body will be charged negatively while its inferior portion is charged positively. These surface charges create a secondary electric field that effectively cancels or blocks the primary field everywhere in the body so that no current will flow within the body under DC steady state conditions. Another example is a human head with electrodes attached. At any conductivity interface with a normal vector n {\displaystyle \mathbf {n} } pointing from an "inside" (-) compartment of conductivity σ − {\displaystyle \sigma ^{-}} to an "outside" (+) compartment of conductivity σ + {\displaystyle \sigma ^{+}} , Kirchhoff's current law requires continuity of the normal component of the electric current density. This leads to the interfacial boundary condition in the form for every facet at a triangulated interface. As long as σ ± {\displaystyle \sigma ^{\pm }} are different from each other, the two normal components of the electric field, E ± ⋅ n {\displaystyle \mathbf {E} ^{\pm }\cdot \mathbf {n} } , must also be different. Such a jump across the interface is only possible when a sheet of surface charge exists at that interface. Thus, if an electric current or voltage is applied, the surface charge density follows. The goal of the numerical analysis is to find the unknown surface charge distribution and thus the total electric field E = E i + E s {\displaystyle \mathbf {E} =\mathbf {E} ^{i}+\mathbf {E} ^{s}} (and the total electric potential if required) anywhere in space. == System of equations for surface charges == Below, a derivation is given based on Gauss's law and Coulomb's law. All conductivity interfaces, denoted by S, are discretized into planar triangular facets t m {\displaystyle t_{m}} with centers r m {\displaystyle \mathbf {r} _{m}} . Assume that an m-th facet with the normal vector n m {\displaystyle \mathbf {n} _{m}} and area A m {\displaystyle A_{m}} carries a uniform surface charge density ρ m {\displaystyle \rho _{m}} . If a volumetric tetrahedral mesh were present, the charged facets would belong to tetrahedra with different conductivity values. We first compute the electric field E m + {\displaystyle \mathbf {E} _{m}^{+}} at the point r m + δ n m {\displaystyle \mathbf {r} _{m}+\delta \mathbf {n} _{m}} , for δ → 0 + {\displaystyle \delta \rightarrow 0^{+}} i.e., just outside facet 𝑚 at its center. This field contains three contributions: The continuous impressed electric field E i {\displaystyle \mathbf {E} ^{i}} itself; An electric field of the m-th charged facet itself. Very close to the facet, it can be approximated as the electric field of an infinite sheet of uniform surface charge ρ m {\displaystyle \rho _{m}} . By Gauss's law, it is given by + ρ m / 2 ε 0 ⋅ n m {\displaystyle +\rho _{m}/2\varepsilon _{0}\cdot \mathbf {n} _{m}} where ε 0 {\displaystyle \varepsilon _{0}} is a background electrical permittivity; An electric field generated by all other facets t n {\displaystyle t_{n}} , which we approximate as point charges of charge A n ρ n {\displaystyle A_{n}\rho _{n}} at each center r n {\displaystyle \mathbf {r} _{n}} . A similar treatment holds for the electric field E m − {\displaystyle \mathbf {E} _{m}^{-}} just inside facet 𝑚, but the electric field of the flat sheet of charge changes its sign. Using Coulomb's law to calculate the contribution of facets different from t m {\displaystyle t_{m}} , we find From this equation, we see that the normal component of the electric field indeed undergoes a jump through the charged interface. This is equivalent to a jump relation of the potential theory. As a second step, the two expressions for E m ± {\displaystyle \mathbf {E} _{m}^{\pm }} are substituted into the interfacial boundary condition σ − E m − ⋅ n m = σ + E m + ⋅ n m {\displaystyle \sigma ^{-}\mathbf {E} _{m}^{-}\cdot \mathbf {n} _{m}=\sigma ^{+}\mathbf {E} _{m}^{+}\cdot \mathbf {n} _{m}} , applied to every facet 𝑚. This operation leads to a system of linear equations for unknown charge densities ρ m {\displaystyle \rho _{m}} which solves the problem: where K m = σ − − σ + σ − + σ + {\displaystyle K_{m}={\frac {\sigma ^{-}-\sigma ^{+}}{\sigma ^{-}+\sigma ^{+}}}} is the electric conductivity contrast at the m-th facet. The normalization constant ε 0 {\displaystyle \varepsilon _{0}} will cancel out after the solution is substituted in the expression for E s {\displaystyle \mathbf {E} ^{s}} and becomes redundant. == Application of fast multipole method == For modern characterizations of brain topologies with ever-increasing levels of complexity, the above system of equations for ρ m {\displaystyle \rho _{m}} is very large; it is t

    Read more →
  • Semidefinite embedding

    Semidefinite embedding

    Maximum Variance Unfolding (MVU), also known as Semidefinite Embedding (SDE), is an algorithm in computer science that uses semidefinite programming to perform non-linear dimensionality reduction of high-dimensional vectorial input data. It is motivated by the observation that kernel Principal Component Analysis (kPCA) does not reduce the data dimensionality, as it leverages the Kernel trick to non-linearly map the original data into an inner-product space. == Algorithm == MVU creates a mapping from the high dimensional input vectors to some low dimensional Euclidean vector space in the following steps: A neighbourhood graph is created. Each input is connected with its k-nearest input vectors (according to Euclidean distance metric) and all k-nearest neighbors are connected with each other. If the data is sampled well enough, the resulting graph is a discrete approximation of the underlying manifold. The neighbourhood graph is "unfolded" with the help of semidefinite programming. Instead of learning the output vectors directly, the semidefinite programming aims to find an inner product matrix that maximizes the pairwise distances between any two inputs that are not connected in the neighbourhood graph while preserving the nearest neighbors distances. The low-dimensional embedding is finally obtained by application of multidimensional scaling on the learned inner product matrix. The steps of applying semidefinite programming followed by a linear dimensionality reduction step to recover a low-dimensional embedding into a Euclidean space were first proposed by Linial, London, and Rabinovich. == Optimization formulation == Let X {\displaystyle X\,\!} be the original input and Y {\displaystyle Y\,\!} be the embedding. If i , j {\displaystyle i,j\,\!} are two neighbors, then the local isometry constraint that needs to be satisfied is: | X i − X j | 2 = | Y i − Y j | 2 {\displaystyle |X_{i}-X_{j}|^{2}=|Y_{i}-Y_{j}|^{2}\,\!} Let G , K {\displaystyle G,K\,\!} be the Gram matrices of X {\displaystyle X\,\!} and Y {\displaystyle Y\,\!} (i.e.: G i j = X i ⋅ X j , K i j = Y i ⋅ Y j {\displaystyle G_{ij}=X_{i}\cdot X_{j},K_{ij}=Y_{i}\cdot Y_{j}\,\!} ). We can express the above constraint for every neighbor points i , j {\displaystyle i,j\,\!} in term of G , K {\displaystyle G,K\,\!} : G i i + G j j − G i j − G j i = K i i + K j j − K i j − K j i {\displaystyle G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji}\,\!} In addition, we also want to constrain the embedding Y {\displaystyle Y\,\!} to center at the origin: 0 = | ∑ i Y i | 2 ⇔ ( ∑ i Y i ) ⋅ ( ∑ i Y i ) ⇔ ∑ i , j Y i ⋅ Y j ⇔ ∑ i , j K i j {\displaystyle 0=|\sum _{i}Y_{i}|^{2}\Leftrightarrow (\sum _{i}Y_{i})\cdot (\sum _{i}Y_{i})\Leftrightarrow \sum _{i,j}Y_{i}\cdot Y_{j}\Leftrightarrow \sum _{i,j}K_{ij}} As described above, except the distances of neighbor points are preserved, the algorithm aims to maximize the pairwise distance of every pair of points. The objective function to be maximized is: T ( Y ) = 1 2 N ∑ i , j | Y i − Y j | 2 {\displaystyle T(Y)={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}} Intuitively, maximizing the function above is equivalent to pulling the points as far away from each other as possible and therefore "unfold" the manifold. The local isometry constraint Let τ = m a x { η i j | Y i − Y j | 2 } {\displaystyle \tau =max\{\eta _{ij}|Y_{i}-Y_{j}|^{2}\}\,\!} where η i j := { 1 if i is a neighbour of j 0 otherwise . {\displaystyle \eta _{ij}:={\begin{cases}1&{\mbox{if}}\ i{\mbox{ is a neighbour of }}j\\0&{\mbox{otherwise}}.\end{cases}}} prevents the objective function from diverging (going to infinity). Since the graph has N points, the distance between any two points | Y i − Y j | 2 ≤ N τ {\displaystyle |Y_{i}-Y_{j}|^{2}\leq N\tau \,\!} . We can then bound the objective function as follows: T ( Y ) = 1 2 N ∑ i , j | Y i − Y j | 2 ≤ 1 2 N ∑ i , j ( N τ ) 2 = N 3 τ 2 2 {\displaystyle T(Y)={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}\leq {\dfrac {1}{2N}}\sum _{i,j}(N\tau )^{2}={\dfrac {N^{3}\tau ^{2}}{2}}\,\!} The objective function can be rewritten purely in the form of the Gram matrix: T ( Y ) = 1 2 N ∑ i , j | Y i − Y j | 2 = 1 2 N ∑ i , j ( Y i 2 + Y j 2 − Y i ⋅ Y j − Y j ⋅ Y i ) = 1 2 N ( ∑ i , j Y i 2 + ∑ i , j Y j 2 − ∑ i , j Y i ⋅ Y j − ∑ i , j Y j ⋅ Y i ) = 1 2 N ( ∑ i , j Y i 2 + ∑ i , j Y j 2 − 0 − 0 ) = 1 N ( ∑ i Y i 2 ) = 1 N ( T r ( K ) ) {\displaystyle {\begin{aligned}T(Y)&{}={\dfrac {1}{2N}}\sum _{i,j}|Y_{i}-Y_{j}|^{2}\\&{}={\dfrac {1}{2N}}\sum _{i,j}(Y_{i}^{2}+Y_{j}^{2}-Y_{i}\cdot Y_{j}-Y_{j}\cdot Y_{i})\\&{}={\dfrac {1}{2N}}(\sum _{i,j}Y_{i}^{2}+\sum _{i,j}Y_{j}^{2}-\sum _{i,j}Y_{i}\cdot Y_{j}-\sum _{i,j}Y_{j}\cdot Y_{i})\\&{}={\dfrac {1}{2N}}(\sum _{i,j}Y_{i}^{2}+\sum _{i,j}Y_{j}^{2}-0-0)\\&{}={\dfrac {1}{N}}(\sum _{i}Y_{i}^{2})={\dfrac {1}{N}}(Tr(K))\\\end{aligned}}\,\!} Finally, the optimization can be formulated as: Maximize T r ( K ) subject to K ⪰ 0 , ∑ i j K i j = 0 and G i i + G j j − G i j − G j i = K i i + K j j − K i j − K j i , ∀ i , j where η i j = 1 , {\displaystyle {\begin{aligned}&{\text{Maximize}}&&Tr(\mathbf {K} )\\&{\text{subject to}}&&\mathbf {K} \succeq 0,\sum _{ij}\mathbf {K} _{ij}=0\\&{\text{and}}&&G_{ii}+G_{jj}-G_{ij}-G_{ji}=K_{ii}+K_{jj}-K_{ij}-K_{ji},\forall i,j{\mbox{ where }}\eta _{ij}=1,\end{aligned}}} After the Gram matrix K {\displaystyle K\,\!} is learned by semidefinite programming, the output Y {\displaystyle Y\,\!} can be obtained via Cholesky decomposition. In particular, the Gram matrix can be written as K i j = ∑ α = 1 N ( λ α V α i V α j ) {\displaystyle K_{ij}=\sum _{\alpha =1}^{N}(\lambda _{\alpha }V_{\alpha i}V_{\alpha j})\,\!} where V α i {\displaystyle V_{\alpha i}\,\!} is the i-th element of eigenvector V α {\displaystyle V_{\alpha }\,\!} of the eigenvalue λ α {\displaystyle \lambda _{\alpha }\,\!} . It follows that the α {\displaystyle \alpha \,\!} -th element of the output Y i {\displaystyle Y_{i}\,\!} is λ α V α i {\displaystyle {\sqrt {\lambda _{\alpha }}}V_{\alpha i}\,\!} .

    Read more →