CLEVER score

CLEVER score

The CLEVER (Cross Lipschitz Extreme Value for nEtwork Robustness) score is a way of measuring the robustness of an artificial neural network towards adversarial attacks. It was developed by a team at the MIT-IBM Watson AI Lab in IBM Research and first presented at the 2018 International Conference on Learning Representations. It was mentioned and reviewed by Ian Goodfellow as well. It was adopted into an educational game Fool The Bank by Narendra Nath Joshi, Abhishek Bhandwaldar and Casey Dugan

Once (dating platform)

Once is an online dating platform founded in 2015. The platform offers users one selected match per day for more meaningful connections. == History == Once was established in 2015, the founders included dating industry entrepreneur Jean Meyer, who became a CEO of the company, as well as Guillaume Sempe and Guilhem Duche. It focused on providing a single daily match to its users. On its early stages Once secured a $3.5 million seed round from Partech Ventures and some private investors. The same year, it opened offices in Paris, and London. By 2016, it reached 1 million users. In 2020, the company was acquired by Dating Group for $18 million. Following the acquisition, Once underwent rebranding. Alexandra Beaumont took over leadership of the brand in 2021, driving growth, rebranding, and innovation. == Overview == Once provides an online dating service with a focus on thoughtful connections. Users receive one selected match per day, which encourages meaningful interactions. The platform operates primarily in the United States, the United Kingdom, Canada, France, and Spain. The platform is supported by Android, iOS, and Apple Watch OS.

Image destriping

Image destriping is the process of removing stripes or streaks from images and videos without disrupting the original image/video. These artifacts plague a range of fields in scientific imaging including atomic force microscopy, light sheet fluorescence microscopy, and planetary satellite imaging. The most common image processing techniques to reduce stripe artifacts is with Fourier filtering. Unfortunately, filtering methods risk altering or suppressing useful image data. Methods developed for multiple-sensor imaging systems in planetary satellites use statistical-based methods to match signal distribution across multiple sensors. More recently, a new class of approaches leverage compressed sensing, to regularize an optimization problem, and recover stripe free images. In many cases, these destriped images have little to no artifacts, even at low signal to noise ratios.

Neural field

In machine learning, a neural field (also known as implicit neural representation, neural implicit, or coordinate-based neural network), is a mathematical field that is fully or partially parametrized by a neural network. Initially developed to tackle visual computing tasks, such as rendering or reconstruction (e.g., neural radiance fields), neural fields emerged as a promising strategy to deal with a wider range of problems, including surrogate modelling of partial differential equations, such as in physics-informed neural networks. Differently from traditional machine learning algorithms, such as feed-forward neural networks, convolutional neural networks, or transformers, neural fields do not work with discrete data (e.g. sequences, images, tokens), but map continuous inputs (e.g., spatial coordinates, time) to continuous outputs (i.e., scalars, vectors, etc.). This makes neural fields not only discretization independent, but also easily differentiable. Moreover, dealing with continuous data allows for a significant reduction in space complexity, which translates to a much more lightweight network. == Formulation and training == According to the universal approximation theorem, provided adequate learning, sufficient number of hidden units, and the presence of a deterministic relationship between the input and the output, a neural network can approximate any function to any degree of accuracy. Hence, in mathematical terms, given a field y = Φ ( x ) {\textstyle {\boldsymbol {y}}=\Phi ({\boldsymbol {x}})} , with x ∈ R n {\displaystyle {\boldsymbol {x}}\in \mathbb {R} ^{n}} and y ∈ R m {\displaystyle {\boldsymbol {y}}\in \mathbb {R} ^{m}} , a neural field Ψ θ {\displaystyle \Psi _{\theta }} , with parameters θ {\displaystyle {\boldsymbol {\theta }}} , is such that: Ψ θ ( x ) = y ^ ≈ y {\displaystyle \Psi _{\theta }({\boldsymbol {x}})={\hat {\boldsymbol {y}}}\approx {\boldsymbol {y}}} === Training === For supervised tasks, given N {\displaystyle N} examples in the training dataset (i.e., ( x i , y i ) ∈ D t r a i n , i = 1 , … , N {\displaystyle ({\boldsymbol {x_{i}}},{\boldsymbol {y_{i}}})\in {\mathcal {D_{train}}},i=1,\dots ,N} ), the neural field parameters can be learned by minimizing a loss function L {\displaystyle {\mathcal {L}}} (e.g., mean squared error). The parameters θ ~ {\displaystyle {\tilde {\theta }}} that satisfy the optimization problem are found as: θ ~ = argmin θ 1 N ∑ ( x i , y i ) ∈ D t r a i n L ( Ψ θ ( x i ) , y i ) {\displaystyle {\tilde {\boldsymbol {\theta }}}={\underset {\boldsymbol {\theta }}{\text{argmin}}}\;{\frac {1}{N}}\sum _{({\boldsymbol {x_{i}}},{\boldsymbol {y_{i}}})\in {\mathcal {D_{train}}}}{\mathcal {L}}(\Psi _{\theta }({\boldsymbol {x}}_{i}),{\boldsymbol {y}}_{i})} Notably, it is not necessary to know the analytical expression of Φ {\displaystyle \Phi } , for the previously reported training procedure only requires input-output pairs. Indeed, a neural field is able to offer a continuous and differentiable surrogate of the true field, even from purely experimental data. Moreover, neural fields can be used in unsupervised settings, with training objectives that depend on the specific task. For example, physics-informed neural networks may be trained on just the residual. === Spectral bias === As for any artificial neural network, neural fields may be characterized by a spectral bias (i.e., the tendency to preferably learn the low frequency content of a field), possibly leading to a poor representation of the ground truth. In order to overcome this limitation, several strategies have been developed. For example, SIREN uses sinusoidal activations, while the Fourier-features approach embeds the input through sines and cosines. == Conditional neural fields == In many real-world cases, however, learning a single field is not enough. For example, when reconstructing 3D vehicle shapes from Lidar data, it is desirable to have a machine learning model that can work with arbitrary shapes (e.g., a car, a bicycle, a truck, etc.). The solution is to include additional parameters, the latent variables (or latent code) z ∈ R d {\displaystyle {\boldsymbol {z}}\in \mathbb {R} ^{d}} , to vary the field and adapt it to diverse tasks. === Latent code production === When dealing with conditional neural fields, the first design choice is represented by the way in which the latent code is produced. Specifically, two main strategies can be identified: Encoder: the latent code is the output of a second neural network, acting as an encoder. During training, the loss function is the objective used to learn the parameters of both the neural field and the encoder. Auto-decoding: each training example has its own latent code, jointly trained with the neural field parameters. When the model has to process new examples (i.e., not originally present in the training dataset), a small optimization problem is solved, keeping the network parameters fixed and only learning the new latent variables. Since the latter strategy requires additional optimization steps at inference time, it sacrifices speed, but keeps the overall model smaller. Moreover, despite being simpler to implement, an encoder may harm the generalization capabilities of the model. For example, when dealing with a physical scalar field f : R 2 → R {\displaystyle f:\mathbb {R} ^{2}\rightarrow \mathbb {R} } (e.g., the pressure of a 2D fluid), an auto-decoder-based conditional neural field can map a single point to the corresponding value of the field, following a learned latent code z {\displaystyle {\boldsymbol {z}}} . However, if the latent variables were produced by an encoder, it would require access to the entire set of points and corresponding values (e.g. as a regular grid or a mesh graph), leading to a less robust model. === Global and local conditioning === In a neural field with global conditioning, the latent code does not depend on the input and, hence, it offers a global representation (e.g., the overall shape of a vehicle). However, depending on the task, it may be more useful to divide the domain of x {\displaystyle {\boldsymbol {x}}} in several subdomains, and learn different latent codes for each of them (e.g., splitting a large and complex scene in sub-scenes for a more efficient rendering). This is called local conditioning. === Conditioning strategies === There are several strategies to include the conditioning information in the neural field. In the general mathematical framework, conditioning the neural field with the latent variables is equivalent to mapping them to a subset θ ∗ {\displaystyle {\boldsymbol {\theta }}^{}} of the neural field parameters: θ ∗ = Γ ( z ) {\displaystyle {\boldsymbol {\theta }}^{}=\Gamma ({\boldsymbol {z}})} In practice, notable strategies are: Concatenation: the neural field receives, as input, the concatenation of the original input x {\displaystyle {\boldsymbol {x}}} with the latent codes z {\displaystyle {\boldsymbol {z}}} . For feed-forward neural networks, this is equivalent to setting θ ∗ {\displaystyle {\boldsymbol {\theta }}^{}} as the bias of the first layer and Γ ( z ) {\displaystyle \Gamma ({\boldsymbol {z}})} as an affine transformation. Hypernetworks: a hypernetwork is a neural network that outputs the parameters of another neural network. Specifically, it consists of approximating Γ ( z ) {\displaystyle \Gamma ({\boldsymbol {z}})} with a neural network Γ ^ γ ( z ) {\displaystyle {\hat {\Gamma }}_{\gamma }({\boldsymbol {z}})} , where γ {\displaystyle {\boldsymbol {\gamma }}} are the trainable parameters of the hypernetwork. This approach is the most general, as it allows to learn the optimal mapping from latent codes to neural field parameters. However, hypernetworks are associated to larger computational and memory complexity, due to the large number of trainable parameters. Hence, leaner approaches have been developed. For example, in the Feature-wise Linear Modulation (FiLM), the hypernetwork only produces scale and bias coefficients for the neural field layers. === Meta-learning === Instead of relying on the latent code to adapt the neural field to a specific task, it is also possible to exploit gradient-based meta-learning. In this case, the neural field is seen as the specialization of an underlying meta-neural-field, whose parameters are modified to fit the specific task, through a few steps of gradient descent. An extension of this meta-learning framework is the CAVIA algorithm, that splits the trainable parameters in context-specific and shared groups, improving parallelization and interpretability, while reducing meta-overfitting. This strategy is similar to the auto-decoding conditional neural field, but the training procedure is substantially different. == Applications == Thanks to the possibility of efficiently modelling diverse mathematical fields with neural networks, neural fields have been applied to a wide range of problems: 3D scene reconstruction: neural fields can be used to model t

LanguageWare

LanguageWare is a natural language processing (NLP) technology developed by IBM, which allows applications to process natural language text. It comprises a set of Java libraries that provide a range of NLP functions: language identification, text segmentation/tokenization, normalization, entity and relationship extraction, and semantic analysis and disambiguation. The analysis engine uses a finite-state machine approach at multiple levels, which aids its performance characteristics while maintaining a reasonably small footprint. The behaviour of the system is driven by a set of configurable lexico-semantic resources which describe the characteristics and domain of the processed language. A default set of resources comes as part of LanguageWare and these describe the native language characteristics, such as morphology, and the basic vocabulary for the language. Supplemental resources have been created that capture additional vocabularies, terminologies, rules and grammars, which may be generic to the language or specific to one or more domains. A set of Eclipse-based customization tooling, LanguageWare Resource Workbench, is available on IBM's alphaWorks site, and allows domain knowledge to be compiled into these resources and thereby incorporated into the analysis process. LanguageWare can be deployed as a set of UIMA-compliant annotators, Eclipse plug-ins or Web Services.

Feature engineering

Feature engineering is a preprocessing step in supervised machine learning and statistical modeling which transforms raw data into a more effective set of inputs. Each input comprises several attributes, known as features. By providing models with relevant information, feature engineering significantly enhances their predictive accuracy and decision-making capability. Beyond machine learning, the principles of feature engineering are applied in various scientific fields, including physics. For example, physicists construct dimensionless numbers such as the Reynolds number in fluid dynamics, the Nusselt number in heat transfer, and the Archimedes number in sedimentation. They also develop first approximations of solutions, such as analytical solutions for the strength of materials in mechanics. == Clustering == One of the applications of feature engineering has been clustering of feature-objects or sample-objects in a dataset. Especially, feature engineering based on matrix decomposition has been extensively used for data clustering under non-negativity constraints on the feature coefficients. These include Non-Negative Matrix Factorization (NMF), Non-Negative Matrix-Tri Factorization (NMTF), Non-Negative Tensor Decomposition/Factorization (NTF/NTD), etc. The non-negativity constraints on coefficients of the feature vectors mined by the above-stated algorithms yields a part-based representation, and different factor matrices exhibit natural clustering properties. Several extensions of the above-stated feature engineering methods have been reported in literature, including orthogonality-constrained factorization for hard clustering, and manifold learning to overcome inherent issues with these algorithms. Other classes of feature engineering algorithms include leveraging a common hidden structure across multiple inter-related datasets to obtain a consensus (common) clustering scheme. An example is Multi-view Classification based on Consensus Matrix Decomposition (MCMD), which mines a common clustering scheme across multiple datasets. MCMD is designed to output two types of class labels (scale-variant and scale-invariant clustering), and: is computationally robust to missing information, can obtain shape- and scale-based outliers, and can handle high-dimensional data effectively. Coupled matrix and tensor decompositions are popular in multi-view feature engineering. == Predictive modelling == Feature engineering in machine learning and statistical modeling involves selecting, creating, transforming, and extracting data features. Key components include feature creation from existing data, transforming and imputing missing or invalid features, reducing data dimensionality through methods like Principal Components Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA), and selecting the most relevant features for model training based on importance scores and correlation matrices. Features vary in significance. Even relatively insignificant features may contribute to a model. Feature selection can reduce the number of features to prevent a model from becoming too specific to the training data set (overfitting). Feature explosion occurs when the number of identified features is too large for effective model estimation or optimization. Common causes include: Feature templates - implementing feature templates instead of coding new features Feature combinations - combinations that cannot be represented by a linear system Feature explosion can be limited via techniques such as regularization, kernel methods, and feature selection. == Automation == Automation of feature engineering is a research topic that dates back to the 1990s. Machine learning software that incorporates automated feature engineering has been commercially available since 2016. Related academic literature can be roughly separated into two types: Multi-relational Decision Tree Learning (MRDTL) uses a supervised algorithm that is similar to a decision tree. Deep Feature Synthesis uses simpler methods. === Multi-relational Decision Tree Learning (MRDTL) === Multi-relational Decision Tree Learning (MRDTL) extends traditional decision tree methods to relational databases, handling complex data relationships across tables. It innovatively uses selection graphs as decision nodes, refined systematically until a specific termination criterion is reached. Most MRDTL studies base implementations on relational databases, which results in many redundant operations. These redundancies can be reduced by using techniques such as tuple id propagation. === Open-source implementations === There are a number of open-source libraries and tools that automate feature engineering on relational data and time series: featuretools is a Python library for transforming time series and relational data into feature matrices for machine learning. MCMD: An open-source feature engineering algorithm for joint clustering of multiple datasets. OneBM or One-Button Machine combines feature transformations and feature selection on relational data with feature selection techniques. OneBM helps data scientists reduce data exploration time allowing them to try and error many ideas in short time. On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time, and cost. getML community is an open source tool for automated feature engineering on time series and relational data. It is implemented in C/C++ with a Python interface. It has been shown to be at least 60 times faster than tsflex, tsfresh, tsfel, featuretools or kats. tsfresh is a Python library for feature extraction on time series data. It evaluates the quality of the features using hypothesis testing. tsflex is an open source Python library for extracting features from time series data. Despite being 100% written in Python, it has been shown to be faster and more memory efficient than tsfresh, seglearn or tsfel. seglearn is an extension for multivariate, sequential time series data to the scikit-learn Python library. tsfel is a Python package for feature extraction on time series data. kats is a Python toolkit for analyzing time series data. === Deep feature synthesis === The deep feature synthesis (DFS) algorithm beat 615 of 906 human teams in a competition. == Feature stores == The feature store is where the features are stored and organized for the explicit purpose of being used to either train models (by data scientists) or make predictions (by applications that have a trained model). It is a central location where you can either create or update groups of features created from multiple different data sources, or create and update new datasets from those feature groups for training models or for use in applications that do not want to compute the features but just retrieve them when it needs them to make predictions. A feature store includes the ability to store code used to generate features, apply the code to raw data, and serve those features to models upon request. Useful capabilities include feature versioning and policies governing the circumstances under which features can be used. Feature stores can be standalone software tools or built into machine learning platforms. == Alternatives == Feature engineering can be a time-consuming and error-prone process, as it requires domain expertise and often involves trial and error. Deep learning algorithms may be used to process a large raw dataset without having to resort to feature engineering. However, deep learning algorithms still require careful preprocessing and cleaning of the input data. In addition, choosing the right architecture, hyperparameters, and optimization algorithm for a deep neural network can be a challenging and iterative process.

Intrinsic dimension

In mathematics, the intrinsic dimension of a subset can be thought of as the minimal number of variables needed to represent the subset. The concept has widespread applications in geometry, dynamical systems, signal processing, statistics, and other fields. Due to its widespread applications and vague conceptualization, there are many different ways to define it rigorously. Consequently, the same set might have different intrinsic dimensions according to different definitions. The intrinsic dimension can be used as a lower bound of what dimension it is possible to compress a data set into through dimension reduction, but it can also be used as a measure of the complexity of the data set or signal. For a data set or signal of N variables, its intrinsic dimension M satisfies 0 ≤ M ≤ N, although estimators may yield higher values. == Exact dimension == === Differential === In differential geometry, given a differentiable manifold N and a submanifold M, the intrinsic dimension of M is its dimension. Suppose N has n dimensions and M has m dimensions, then that means around any point in M, there exists a local coordinate system ( x 1 , … , x m , x m + 1 , … , x n ) {\displaystyle (x_{1},\dots ,x_{m},x_{m+1},\dots ,x_{n})} of N, such that the manifold M is simply the subset of N defined by x m + 1 = 0 , … , x n = 0 {\displaystyle x_{m+1}=0,\dots ,x_{n}=0} . === Metric === Given a mere metric space, we can still define its intrinsic dimension. The most general case is the Hausdorff dimension, though for metric spaces occurring in practice, the box-counting dimension and the packing dimension often are identical to the Hausdorff dimension. Let X , d {\textstyle X,d} be a metric space and A ⊂ X {\textstyle A\subset X} be totally bounded. Define the covering number N ( A , ε ) = min { k : A ⊂ ⋃ i = 1 k B ( x i , ε ) } . {\displaystyle N(A,\varepsilon )=\min \left\{k:A\subset \bigcup _{i=1}^{k}B\left(x_{i},\varepsilon \right)\right\}.} The metric entropy is H ( A , ε ) = log ⁡ N ( A , ε ) {\textstyle H(A,\varepsilon )=\log N(A,\varepsilon )} (any log base). The upper and lower metric entropy dimensions are dim ¯ E A = lim sup ε ↓ 0 H ( A , ε ) log ⁡ ( 1 / ε ) , dim _ E A = lim inf ε ↓ 0 H ( A , ε ) log ⁡ ( 1 / ε ) . {\displaystyle {\overline {\dim }}_{E}A=\limsup _{\varepsilon \downarrow 0}{\frac {H(A,\varepsilon )}{\log(1/\varepsilon )}},\quad {\underline {\dim }}_{E}A=\liminf _{\varepsilon \downarrow 0}{\frac {H(A,\varepsilon )}{\log(1/\varepsilon )}}.} If they are equal, then dim E ⁡ A {\textstyle \operatorname {dim} _{E}A} is that common value, called the metric entropy dimension. The entropy dimensions are usually used in information theory, and especially coding theory, since entropy is involved in its definition. === Topological === If X {\displaystyle X} is merely a topological space, then we can still define its intrinsic dimension, using the topological dimension or Lebesgue covering dimension. An open cover of a topological space X is a family of open sets Uα such that their union is the whole space, ∪ α {\displaystyle \cup _{\alpha }} Uα = X. The order or ply of an open cover A {\displaystyle {\mathfrak {A}}} = {Uα} is the smallest number m (if it exists) for which each point of the space belongs to at most m open sets in the cover: in other words Uα1 ∩ ⋅⋅⋅ ∩ Uαm+1 = ∅ {\displaystyle \emptyset } for α1, ..., αm+1 distinct. A refinement of an open cover A {\displaystyle {\mathfrak {A}}} = {Uα} is another open cover B {\displaystyle {\mathfrak {B}}} = {Vβ}, such that each Vβ is contained in some Uα. The covering dimension of a topological space X is defined to be the minimum value of n such that every finite open cover A {\displaystyle {\mathfrak {A}}} of X has an open refinement B {\displaystyle {\mathfrak {B}}} with order n + 1. The refinement B {\displaystyle {\mathfrak {B}}} can always be chosen to be finite. Thus, if n is finite, Vβ1 ∩ ⋅⋅⋅ ∩ Vβn+2 = ∅ {\displaystyle \emptyset } for β1, ..., βn+2 distinct. If no such minimal n exists, the space is said to have infinite covering dimension. == Introductory example == Let f ( x 1 , x 2 ) {\textstyle f(x_{1},x_{2})} be a two-variable function (or signal) which is of the form f ( x 1 , x 2 ) = g ( x 1 ) {\textstyle f(x_{1},x_{2})=g(x_{1})} for some one-variable function g which is not constant. This means that f varies, in accordance to g, with the first variable or along the first coordinate. On the other hand, f is constant with respect to the second variable or along the second coordinate. It is only necessary to know the value of one, namely the first, variable in order to determine the value of f. Hence, it is a two-variable function but its intrinsic dimension is one. A slightly more complicated example is f ( x 1 , x 2 ) = g ( x 1 + x 2 ) {\textstyle f(x_{1},x_{2})=g(x_{1}+x_{2})} . f is still intrinsic one-dimensional, which can be seen by making a variable transformation y 1 = x 1 + x 2 {\textstyle y_{1}=x_{1}+x_{2}} and y 2 = x 1 − x 2 {\textstyle y_{2}=x_{1}-x_{2}} which gives f ( y 1 + y 2 2 , y 1 − y 2 2 ) = g ( y 1 ) {\textstyle f\left({\frac {y_{1}+y_{2}}{2}},{\frac {y_{1}-y_{2}}{2}}\right)=g\left(y_{1}\right)} . Since the variation in f can be described by the single variable y1 its intrinsic dimension is one. For the case that f is constant, its intrinsic dimension is zero since no variable is needed to describe variation. For the general case, when the intrinsic dimension of the two-variable function f is neither zero or one, it is two. In the literature, functions which are of intrinsic dimension zero, one, or two are sometimes referred to as i0D, i1D or i2D, respectively. == Signal processing == In signal processing of multidimensional signals, the intrinsic dimension of the signal describes how many variables are needed to generate a good approximation of the signal. For an N-variable function f, the set of variables can be represented as an N-dimensional vector x: f = f ( x ) where x = ( x 1 , … , x N ) {\textstyle f=f\left(\mathbf {x} \right){\text{ where }}\mathbf {x} =\left(x_{1},\dots ,x_{N}\right)} . If for some M-variable function g and M × N matrix A it is the case that for all x; f ( x ) = g ( A x ) , {\textstyle f(\mathbf {x} )=g(\mathbf {Ax} ),} M is the smallest number for which the above relation between f and g can be found, then the intrinsic dimension of f is M. The intrinsic dimension is a characterization of f, it is not an unambiguous characterization of g nor of A. That is, if the above relation is satisfied for some f, g, and A, it must also be satisfied for the same f and g′ and A′ given by g ′ ( y ) = g ( B y ) {\textstyle g'\left(\mathbf {y} \right)=g\left(\mathbf {By} \right)} and A ′ = B − 1 A {\textstyle \mathbf {A'} =\mathbf {B} ^{-1}\mathbf {A} } where B is a non-singular M × M matrix, since f ( x ) = g ′ ( A ′ x ) = g ( B A ′ x ) = g ( A x ) {\textstyle f\left(\mathbf {x} \right)=g'\left(\mathbf {A'x} \right)=g\left(\mathbf {BA'x} \right)=g\left(\mathbf {Ax} \right)} . == The Fourier transform of signals of low intrinsic dimension == An N variable function which has intrinsic dimension M < N has a characteristic Fourier transform. Intuitively, since this type of function is constant along one or several dimensions its Fourier transform must appear like an impulse (the Fourier transform of a constant) along the same dimension in the frequency domain. === A simple example === Let f be a two-variable function which is i1D. This means that there exists a normalized vector n ∈ R 2 {\textstyle \mathbf {n} \in \mathbb {R} ^{2}} and a one-variable function g such that f ( x ) = g ( n T x ) {\textstyle f(\mathbf {x} )=g(\mathbf {n} ^{\operatorname {T} }\mathbf {x} )} for all x ∈ R 2 {\textstyle \mathbf {x} \in \mathbb {R} ^{2}} . If F is the Fourier transform of f (both are two-variable functions) it must be the case that F ( u ) = G ( n T u ) ⋅ δ ( m T u ) {\textstyle F\left(\mathbf {u} \right)=G\left(\mathbf {n} ^{\mathrm {T} }\mathbf {u} \right)\cdot \delta \left(\mathbf {m} ^{\mathrm {T} }\mathbf {u} \right)} . Here G is the Fourier transform of g (both are one-variable functions), δ is the Dirac impulse function and m is a normalized vector in R 2 {\textstyle \mathbb {R} ^{2}} perpendicular to n. This means that F vanishes everywhere except on a line which passes through the origin of the frequency domain and is parallel to m. Along this line F varies according to G. === The general case === Let f be an N-variable function which has intrinsic dimension M, that is, there exists an M-variable function g and M × N matrix A such that f ( x ) = g ( A x ) ∀ x {\textstyle f(\mathbf {x} )=g(\mathbf {Ax} )\quad \forall \mathbf {x} } . Its Fourier transform F can then be described as follows: F vanishes everywhere except for a subspace of dimension M The subspace M is spanned by the rows of the matrix A In the subspace, F varies according to G the Fourier transform of g == Generalizations == The type of intrinsic dimension described above assume