Query rewriting

Query rewriting

Query rewriting is a typically automatic transformation that takes a set of database tables, views, and/or queries, usually indices, often gathered data and query statistics, and other metadata, and yields a set of different queries, which produce the same results but execute with better performance (for example, faster, or with lower memory use). Query rewriting can be based on relational algebra or an extension thereof (e.g. multiset relational algebra with sorting, aggregation and three-valued predicates i.e. NULLs as in the case of SQL). The equivalence rules of relational algebra are exploited, in other words, different query structures and orderings can be mathematically proven to yield the same result. For example, filtering on fields A and B, or cross joining R and S can be done in any order, but there can be a performance difference. Multiple operations may be combined, and operation orders may be altered. The result of query rewriting may not be at the same abstraction level or application programming interface (API) as the original set of queries (though often is). For example, the input queries may be in relational algebra or SQL, and the rewritten queries may be closer to the physical representation of the data, e.g. array operations. Query rewriting can also involve materialization of views and other subqueries; operations that may or may not be available to the API user. The query rewriting transformation can be aided by creating indices from which the optimizer can choose (some database systems create their own indexes if deemed useful), mandating the use of specific indices, creating materialized and/or denormalized views, or helping a database system gather statistics on the data and query use, as the optimality depends on patterns in data and typical query usage. Query rewriting may be rule based or optimizer based. Some sources discuss query rewriting as a distinct step prior to optimization, operating at the level of the user accessible algebra API (e.g. SQL). There are other, largely unrelated concepts also named similarly, for example, query rewriting by search engines.

SQLBuddy

SQL Buddy is an open-source web-based application primarily coded in PHP, that allows users to control both MySQL and SQLite database through a web browser. The project was well regarded for its easy installation process and the friendly user interface it offered. The application was further praised for its cross-platform compatibility, meaning users could manage their databases on various operating systems, including Linux, Windows, and macOS. The development of SQL Buddy has stopped, with version 1.3.3 being the final release on January 18, 2011. No further releases are expected.

Quantum artificial life

Quantum artificial life is the application of quantum algorithms with the ability to simulate biological behavior. Quantum computers offer many potential improvements to processes performed on classical computers, including machine learning and artificial intelligence. Artificial intelligence applications are often inspired by the idea of mimicking human brains through closely related biomimicry. This has been implemented to a certain extent on classical computers (using neural networks), but quantum computers offer many advantages in the simulation of artificial life. Artificial life and artificial intelligence are extremely similar, with minor differences; the goal of studying artificial life is to understand living beings better, while the goal of artificial intelligence is to create intelligent beings. In 2016, Alvarez-Rodriguez et al. developed a proposal for a quantum artificial life algorithm with the ability to simulate life and Darwinian evolution. In 2018, the same research team led by Alvarez-Rodriguez performed the proposed algorithm on the IBM ibmqx4 quantum computer, and received optimistic results. The results accurately simulated a system with the ability to undergo self-replication at the quantum scale. == Artificial life on quantum computers == The growing advancement of quantum computers has led researchers to develop quantum algorithms for simulating life processes. Researchers have designed a quantum algorithm that can accurately simulate Darwinian Evolution. Since the complete simulation of artificial life on quantum computers has only been actualized by one group, this section shall focus on the implementation by Alvarez-Rodriguez, Sanz, Lomata, and Solano on an IBM quantum computer. Individuals were realized as two qubits, one representing the genotype of the individual and the other representing the phenotype. The genotype is copied to transmit genetic information through generations, and the phenotype is dependent on the genetic information as well as the individual's interactions with their environment. In order to set up the system, the state of the genotype is instantiated by some rotation of an ancillary state ( | 0 ⟩ ⟨ 0 | {\displaystyle |0\rangle \langle 0|} ). The environment is a two-dimensional spatial grid occupied by individuals and ancillary states. The environment is divided into cells that are able to possess one or more individuals. Individuals move throughout the grid and occupy cells randomly; when two or more individuals occupy the same cell they interact with each other. === Self replication === The ability to self-replicate is critical for simulating life. Self-replication occurs when the genotype of an individual interacts with an ancillary state, creating a genotype for a new individual; this genotype interacts with a different ancillary state in order to create the phenotype. During this interaction, one would like to copy some information about the initial state into the ancillary state, but by the no cloning theorem, it is impossible to copy an arbitrary unknown quantum state. However, physicists have derived different methods for quantum cloning which does not require the exact copying of an unknown state. The method that has been implemented by Alvarez-Rodriguez et al. is one that involves the cloning of the expectation value of some observable. For a unitary U {\displaystyle U} which copies the expectation value of some set of observables X {\displaystyle {\mathsf {X}}} of state ρ {\displaystyle \rho } into a blank state ρ e {\displaystyle \rho _{e}} , the cloning machine is defined by any ( U , ρ e , X ) {\displaystyle (U,\rho _{e},{\mathsf {X}})} that fulfill the following: ∀ ρ ∀ X ∈ X {\displaystyle \forall \rho \forall X\in {\mathsf {X}}} X ¯ = X 1 ¯ = X 2 ¯ {\displaystyle {\bar {X}}={\bar {X_{1}}}={\bar {X_{2}}}} Where X ¯ {\displaystyle {\bar {X}}} is the mean value of the observable in ρ {\displaystyle \rho } before cloning, X 1 ¯ {\displaystyle {\bar {X_{1}}}} is the mean value of the observable in ρ {\displaystyle \rho } after cloning, and X 2 ¯ {\displaystyle {\bar {X_{2}}}} is the mean value of the observable in ρ e {\displaystyle \rho _{e}} after cloning. Note that the cloning machine has no dependence on ρ {\displaystyle \rho } because we want to be able to clone the expectation of the observables for any initial state. It is important to note that cloning the mean value of the observable transmits more information than is allowed classically. The calculation of the mean value is defined naturally as: X ¯ = T r [ ρ X ] {\displaystyle {\bar {X}}=Tr[\rho X]} , X 1 ¯ = T r [ R X ⊗ I ] {\displaystyle {\bar {X_{1}}}=Tr[RX\otimes I]} , X 2 ¯ = T r [ R I ⊗ X ] {\displaystyle {\bar {X_{2}}}=Tr[RI\otimes X]} where R = U ρ ⊗ ρ e U † {\displaystyle R=U\rho \otimes \rho _{e}U^{\dagger }} The simplest cloning machine clones the expectation value of σ z {\displaystyle \sigma _{z}} in arbitrary state ρ = | ψ ⟩ ⟨ ψ | {\displaystyle \rho =|\psi \rangle \langle \psi |} to ρ e = | 0 ⟩ ⟨ 0 | {\displaystyle \rho _{e}=|0\rangle \langle 0|} using U = C N O T {\displaystyle U=CNOT} . This is the cloning machine implemented for self-replication by Alvarez-Rodriguez et al. The self-replication process clearly only requires interactions between two qubits, and therefore this cloning machine is the only one necessary for self replication. === Interactions === Interactions occur between individuals when the two take up the same space on the environmental grid. The presence of interactions between individuals provides an advantage for shorter-lifespan individuals. When two individuals interact, exchanges of information between the two phenotypes may or may not occur based on their existing values. When both individual's control qubits (genotypes) are alike, no information will be exchanged. When the control qubits differ, the target qubits (phenotype) will be exchanged between the two individuals. This procedure produces a constantly changing predator-prey dynamic in the simulation. Therefore, long-living qubits, with a larger genetic makeup in the simulation, are at a disadvantage. Since information is only exchanged when interacting with an individual of different genetic makeup, the short-lived population has the advantage. === Mutation === Mutations exist in the artificial world with limited probability, equivalent to their occurrence in the real world. There are two ways in which the individual can mutate: through random single qubit rotations and by errors in the self-replication process. There are two different operators that act on the individual and cause mutations. The M operation causes a spontaneous mutation within the individual by rotating a single qubit by parameter θ. The parameter θ is random for each mutation, which creates biodiversity within the artificial environment. The M operation is a unitary matrix which can be described as: M = ( cos ⁡ ( θ ) s i n ( θ ) s i n ( θ ) − c o s ( θ ) ) {\displaystyle M={\begin{pmatrix}\cos(\theta )&sin(\theta )\\sin(\theta )&-cos(\theta )\end{pmatrix}}} The other possible way for mutations to occur is due to errors in the replication process. Due to the no-cloning theorem, it is impossible to produce perfect copies of systems that are originally in unknown quantum states. However, quantum cloning machines make it possible to create imperfect copies of quantum states, in other words, the process introduces some degree of error. The error that exists in current quantum cloning machines is the root cause for the second kind of mutations in the artificial life experiment. The imperfect cloning operation can be seen as: U M ( θ ) = I 4 + 1 2 ( 0 0 0 1 ) ⊗ ( − 1 1 1 − 1 ) ( c o s θ + i s i n θ + 1 ) {\displaystyle U_{M}(\theta )=\mathrm {I} _{4}+{\frac {1}{2}}{\begin{pmatrix}0&0\\0&1\end{pmatrix}}\otimes {\begin{pmatrix}-1&1\\1&-1\end{pmatrix}}(cos\theta +isin\theta +1)} The two kinds of mutations affect the individual differently. While the spontaneous M operation does not affect the phenotype of the individual, the self-replicating error mutation, UM, alters both the genotype of the individual, and its associated lifetime. The presence of mutations in the quantum artificial life experiment is critical for providing randomness and biodiversity. The inclusion of mutations helps to increase the accuracy of the quantum algorithm. === Death === At the instant the individual is created (when the genotype is copied into the phenotype), the phenotype interacts with the environment. As time evolves, the interaction of the individual with the environment simulates aging which eventually leads to the death of the individual. The death of an individual occurs when the expectation value of σ z {\displaystyle \sigma _{z}} is within some ϵ {\displaystyle \epsilon } of 1 in the phenotype, or, equivalently, when ρ p = | 0 ⟩ ⟨ 0 | {\displaystyle \rho _{p}=|0\rangle \langle 0|} The Lindbladian describes the interaction of the individual with the environment: ρ

Adversarial machine learning

Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. Machine learning techniques are mostly designed to work on specific problem sets, under the assumption that the training and test data are generated from the same statistical distribution (IID). However, this assumption is often violated in practical high-stake applications, where users may intentionally supply fabricated data that violates the statistical assumption. Most common attacks in adversarial machine learning include evasion attacks, data poisoning attacks, Byzantine attacks and model extraction. == History == At the MIT Spam Conference in January 2004, John Graham-Cumming showed that a machine-learning spam filter could be used to defeat another machine-learning spam filter by automatically learning which words to add to a spam email to get the email classified as not spam. In 2004, Nilesh Dalvi and others noted that linear classifiers used in spam filters could be defeated by simple "evasion attacks" as spammers inserted "good words" into their spam emails. (Around 2007, some spammers added random noise to fuzz words within "image spam" in order to defeat OCR-based filters.) In 2006, Marco Barreno and others published "Can Machine Learning Be Secure?", outlining a broad taxonomy of attacks. As late as 2013 many researchers continued to hope that non-linear classifiers (such as support vector machines and neural networks) might be robust to adversaries, until Battista Biggio and others demonstrated the first gradient-based attacks on such machine-learning models (2012–2013). In 2012, deep neural networks began to dominate computer vision problems; starting in 2014, Christian Szegedy and others demonstrated that deep neural networks could be fooled by adversaries, again using a gradient-based attack to craft adversarial perturbations. Further work would show that adversarial attacks are harder to produce in uncontrolled environments, due to the different environmental constraints that cancel out the effect of noise. For example, any small rotation or slight illumination on an adversarial image can destroy the adversariality. In addition, researchers such as Google Brain's Nick Frosst point out that it is much easier to make self-driving cars miss stop signs by physically removing the sign itself, rather than creating adversarial examples. Frosst also believes that the adversarial machine learning community incorrectly assumes models trained on a certain data distribution will also perform well on a completely different data distribution. He suggests that a new approach to machine learning should be explored, and is currently working on a unique neural network that has characteristics more similar to human perception than state-of-the-art approaches. While adversarial machine learning continues to be heavily rooted in academia, large tech companies such as Google, Microsoft, and IBM have begun curating documentation and open source code bases to allow others to concretely assess the robustness of machine learning models and minimize the risk of adversarial attacks. === Examples === Examples include attacks in spam filtering, where spam messages are obfuscated through the misspelling of "bad" words or the insertion of "good" words; attacks in computer security, such as obfuscating malware code within network packets or modifying the characteristics of a network flow to mislead intrusion detection; attacks in biometric recognition where fake biometric traits may be exploited to impersonate a legitimate user; or to compromise users' template galleries that adapt to updated traits over time. Researchers showed that by changing only one-pixel it was possible to fool deep learning algorithms. Others 3-D printed a toy turtle with a texture engineered to make Google's object detection AI classify it as a rifle regardless of the angle from which the turtle was viewed. Creating the turtle required only low-cost commercially available 3-D printing technology. A machine-tweaked image of a dog was shown to look like a cat to both computers and humans. A 2019 study reported that humans can guess how machines will classify adversarial images. Researchers discovered methods for perturbing the appearance of a stop sign such that an autonomous vehicle classified it as a merge or speed limit sign. A data poisoning filter called Nightshade was released in 2023 by researchers at the University of Chicago. It was created for use by visual artists to put on their artwork to corrupt the data set of text-to-image models, which usually scrape their data from the internet without the consent of the image creator. McAfee attacked Tesla's former Mobileye system, fooling it into driving 50 mph over the speed limit, simply by adding a two-inch strip of black tape to a speed limit sign. Adversarial patterns on glasses or clothing designed to deceive facial-recognition systems or license-plate readers, have led to a niche industry of "stealth streetwear". An adversarial attack on a neural network can allow an attacker to inject algorithms into the target system. Researchers can also create adversarial audio inputs to disguise commands to intelligent assistants in benign-seeming audio; a parallel literature explores human perception of such stimuli. Clustering algorithms are used in security applications. Malware and computer virus analysis aims to identify malware families, and to generate specific detection signatures. In the context of malware detection, researchers have proposed methods for adversarial malware generation that automatically craft binaries to evade learning-based detectors while preserving malicious functionality. Optimization-based attacks such as GAMMA use genetic algorithms to inject benign content (for example, padding or new PE sections) into Windows executables, framing evasion as a constrained optimization problem that balances misclassification success with the size of the injected payload and showing transferability to commercial antivirus products. Complementary work uses generative adversarial networks (GANs) to learn feature-space perturbations that cause malware to be classified as benign; Mal-LSGAN, for instance, replaces the standard GAN loss with a least-squares objective and modified activation functions to improve training stability and produce adversarial malware examples that substantially reduce true positive rates across multiple detectors. == Challenges in applying machine learning to security == Researchers have observed that the constraints under which machine-learning techniques function in the security domain are different from those of common benchmark domains. Security data may change over time, include mislabeled samples, or reflect adversarial behavior, which complicates evaluation and reproducibility. === Data collection issues === Security datasets vary across formats, including binaries, network traces, and log files. Studies have reported that the process of converting these sources into features can introduce bias or inconsistencies. In addition, time-based leakage can occur when related malware samples are not properly separated across training and testing splits, which may lead to overly optimistic results. === Labeling and ground truth challenges === Malware labels are often unstable because different antivirus engines may classify the same sample in conflicting ways. Ceschin et al. note that families may be renamed or reorganized over time, causing further discrepancies in ground truth and reducing the reliability of benchmarks. === Concept drift === Because malware creators continuously adapt their techniques, the statistical properties of malicious samples also change. This form of concept drift has been widely documented and may reduce model performance unless systems are updated regularly or incorporate mechanisms for incremental learning. === Feature robustness === Researchers differentiate between features that can be easily manipulated and those that are more resistant to modification. For example, simple static attributes, such as header fields, may be altered by attackers, while structural features, such as control-flow graphs, are generally more stable but computationally expensive to extract. === Class imbalance === In realistic deployment environments, the proportion of malicious samples can be extremely low, ranging from 0.01% to 2% of total data. This unbalanced distribution causes models to develop a bias towards the majority class, achieving high accuracy but failing to identify malicious samples. Prior approaches to this problem have included both data-level solutions and sequence-specific models. Methods like n-gram and Long Short-Term Memory (LSTM) networks can model sequential data, but their performance has been shown to decline significantly when malware samples are realistically proportioned in the training set, demonstrating the limitations in

Personality computing

Personality computing is a research field related to artificial intelligence and personality psychology that studies personality by means of computational techniques from different sources, including text, multimedia, and social networks. == Overview == Personality computing addresses three main problems involving personality: automatic personality recognition, perception, and synthesis. Automatic personality recognition is the inference of the personality type of target individuals from their digital footprint. Automatic personality perception is the inference of the personality attributed by an observer to a target individual based on some observable behavior. Automatic personality synthesis is the generation of the style or behaviour of artificial personalities in Avatars and virtual agents. Self-assessed personality tests or observer ratings are always exploited as the ground truth for testing and validating the performance of artificial intelligence algorithms for the automatic prediction of personality types. There is a wide variety of personality tests, such as the Myers Briggs Type Indicator (MBTI) or the MMPI, but the most used are tests based on the Five Factor Model such as the Revised NEO Personality Inventory. Personality computing can be considered as an extension or complement of Affective computing, where the former focuses on personality traits and the latter on affective states. A further extension of the two fields is Character Computing which combines various character states and traits including but not limited to personality and affect. == History == Personality computing began around 2005 with the pioneering research in personality recognition by Shlomo Argamon and later by François Mairesse. These works showed that personality traits could be inferred with reasonable accuracy from text, such as blogs, self-presentations, and email addresses. In 2008, the concept of "portable personality" for the distributed management of personality profiles has been developed. A few years later, research began in personality recognition and perception from multimodal and social signals, such as recorded meetings and voice calls. In the 2010s, the research focused mainly on personality recognition and perception from social media, helped by the first workshops organized by Fabio Celli. In particular personality was extracted from Facebook, Twitter and Instagram. In the same years, automatic personality synthesis helped improve the coherence of simulated behavior in virtual agents. Scientific works by Michal Kosinski demonstrated the validity of Personality Computing from different digital footprints, in particular from user preferences such as Facebook page likes, showed that machines can recognize personality better than humans and raised a warning against Cambridge Analytica and misuse of this kind of technology. == Applications == Personality computing techniques, in particular personality recognition and perception, have applications in Social media marketing, where they can help reducing the cost of advertising campaigns through psychological targeting.

Image-based modeling and rendering

In computer graphics and computer vision, image-based modeling and rendering (IBMR) methods rely on a set of two-dimensional images of a scene to generate a three-dimensional model and then render some novel views of this scene. The traditional approach of computer graphics has been used to create a geometric model in 3D and try to reproject it onto a two-dimensional image. Computer vision, conversely, is mostly focused on detecting, grouping, and extracting features (edges, faces, etc.) present in a given picture and then trying to interpret them as three-dimensional clues. Image-based modeling and rendering allows the use of multiple two-dimensional images in order to generate directly novel two-dimensional images, skipping the manual modeling stage. == Light modeling == Instead of considering only the physical model of a solid, IBMR methods usually focus more on light modeling. The fundamental concept behind IBMR is the plenoptic illumination function which is a parametrisation of the light field. The plenoptic function describes the light rays contained in a given volume. It can be represented with seven dimensions: a ray is defined by its position ( x , y , z ) {\displaystyle (x,y,z)} , its orientation ( θ , ϕ ) {\displaystyle (\theta ,\phi )} , its wavelength ( λ ) {\displaystyle (\lambda )} and its time ( t ) {\displaystyle (t)} : P ( x , y , z , θ , ϕ , λ , t ) {\displaystyle P(x,y,z,\theta ,\phi ,\lambda ,t)} . IBMR methods try to approximate the plenoptic function to render a novel set of two-dimensional images from another. Given the high dimensionality of this function, practical methods place constraints on the parameters in order to reduce this number (typically to 2 to 4). == IBMR methods and algorithms == View morphing generates a transition between images Panoramic imaging renders panoramas using image mosaics of individual still images Lumigraph relies on a dense sampling of a scene Space carving generates a 3D model based on a photo-consistency check

Neurocomputing (journal)

Neurocomputing is a peer-reviewed scientific journal covering research on artificial intelligence, machine learning, and neural computation. It was established in 1989 and is published by Elsevier. The editor-in-chief is Zidong Wang (Brunel University London). Independent scientometric studies noted that despite being one of the most productive journals in the field, it has kept its reputation across the years intact and plays an important role in leading the research in the area. The journal is abstracted and indexed in Scopus and Science Citation Index Expanded. According to the Journal Citation Reports, its 2023 impact factor is 5.5.