AI Data Delivery F5

AI Data Delivery F5 — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • AARON

    AARON

    AARON is the collective name for a series of computer programs written by artist Harold Cohen that create original artistic images autonomously, which set it apart from previous programs. Proceeding from Cohen's initial question "What are the minimum conditions under which a set of marks functions as an image?", AARON was in development between 1972 and the 2010s. As the software is not open source, its development effectively ended with Cohen's death in 2016. The name "AARON" does not seem to be an acronym; rather, it was a name chosen to start with the letter "A" so that the names of successive programs could follow it alphabetically. However, Cohen did not create any other major programs. Initial versions of AARON created abstract drawings that grew more complex through the 1970s. More representational imagery was added in the 1980s; first rocks, then plants, then people. In the 1990s more representational figures set in interior scenes were added, along with color. AARON returned to more abstract imagery, this time in color, in the early 2000s. Cohen used machines that allowed AARON to produce physical artwork. The first machines drew in black and white using a succession of custom-built "turtle" and flatbed plotter devices. Cohen would sometimes color these images by hand in fabric dye (Procion), or scale them up to make larger paintings and murals. In the 1990s Cohen built a series of digital painting machines to output AARON's images in ink and fabric dye. His later work used a large-scale inkjet printer on canvas. Development of AARON began in the C programming language then switched to Lisp in the early 1990s. Cohen credits Lisp with helping him solve the challenges he faced in adding color capabilities to AARON. An article about Cohen appeared in Computer Answers that describes AARON and shows two line drawings that were exhibited at the Tate gallery. The article goes on to describe the workings of AARON, then running on a DEC VAX 750 minicomputer. Raymond Kurzweil's company has produced a downloadable screensaver of AARON for Microsoft Windows PCs. This version of AARON can also produce printable images. AARON's source code is not publicly available, but Cohen has described AARON's operations in various essays and it is discussed in abstract in Pamela McCorduck's book. AARON cannot learn new styles or imagery on its own; each new capability must be hand-coded by Cohen. It is capable of producing a practically infinite supply of distinct images in its own style. Examples of these images have been exhibited in galleries worldwide. AARON's artwork has been used as an artistic equivalent of the Turing test. It does seem however that AARON's output follows a noticeable formula (figures standing next to a potted plant, framed within a colored square is a common theme). Cohen is very careful not to claim that AARON is creative. But he does ask "If what AARON is making is not art, what is it exactly, and in what ways, other than its origin, does it differ from the 'real thing?' If it is not thinking, what exactly is it doing?" — The further exploits of AARON, Painter. The Whitney Museum featured AARON in 2024, showcasing the evolution of AARON as the earliest artificial intelligence (AI) program for artmaking.

    Read more →
  • Aggregation (linguistics)

    Aggregation (linguistics)

    In linguistics, aggregation is a subtask of natural language generation, which involves merging syntactic constituents (such as sentences and phrases) together. Sometimes aggregation can be done at a conceptual level. == Examples == A simple example of syntactic aggregation is merging the two sentences John went to the shop and John bought an apple into the single sentence John went to the shop and bought an apple. Syntactic aggregation can be much more complex than this. For example, aggregation can embed one of the constituents in the other; e.g., we can aggregate John went to the shop and The shop was closed into the sentence John went to the shop, which was closed. From a pragmatic perspective, aggregating sentences together often suggests to the reader that these sentences are related to each other. If this is not the case, the reader may be confused. For example, someone who reads John went to the shop and bought an apple may infer that the apple was bought in the shop; if this is not the case, then these sentences should not be aggregated. == Algorithms and issues == Aggregation algorithms must do two things: Decide when two constituents should be aggregated Decide how two constituents should be aggregated, and create the aggregated structure The first issue, deciding when to aggregate, is poorly understood. Aggegration decisions certainly depend on the semantic relations between the constituents, as mentioned above; they also depend on the genre (e.g., bureaucratic texts tend to be more aggregated than instruction manuals). They probably should depend on rhetorical and discourse structure. The literacy level of the reader is also probably important (poor readers need shorter sentences). But we have no integrated model which brings all these factors together into a single algorithm. With regard to the second issue, there have been some studies of different types of aggregation, and how they should be carried out. Harbusch and Kempen describe several syntactic aggregation strategies. In their terminology, John went to the shop and bought an apple is an example of forward conjunction Reduction Much less is known about conceptual aggregation. Di Eugenio et al. show how conceptual aggregation can be done in an intelligent tutoring system, and demonstrate that performing such aggregation makes the system more effective (and that conceptual aggregation make a bigger impact than syntactic aggregation). == Software == Unfortunately there is not much software available for performing aggregation. However the SimpleNLG system does include limited support for basic aggregation. For example, the following code causes SimpleNLG to print out The man is hungry and buys an apple.

    Read more →
  • Common data model

    Common data model

    A common data model (CDM) can refer to any standardised data model which allows for data and information exchange between different applications and data sources. Common data models aim to standardise logical infrastructure so that related applications can "operate on and share the same data", and can be seen as a way to "organize data from many sources that are in different formats into a standard structure". A common data model has been described as one of the components of a "strong information system". A standardised common data model has also been described as a typical component of a well designed agile application besides a common communication protocol. Providing a single common data model within an organisation is one of the typical tasks of a data warehouse. == Examples of common data models == === Border crossings === X-trans.eu was a cross-border pilot project between the Free State of Bavaria (Germany) and Upper Austria with the aim of developing a faster procedure for the application and approval of cross-border large-capacity transports. The portal was based on a common data model that contained all the information required for approval. === Climate data === The Climate Data Store Common Data Model is a common data model set up by the Copernicus Climate Change Service for harmonising essential climate variables from different sources and data providers. === General information technology === Within service-oriented architecture, S-RAMP is a specification released by HP, IBM, Software AG, TIBCO, and Red Hat which defines a common data model for SOA repositories as well as an interaction protocol to facilitate the use of common tooling and sharing of data. Content Management Interoperability Services (CMIS) is an open standard for inter-operation of different content management systems over the internet, and provides a common data model for typed files and folders used with version control. The NetCDF software libraries for array-oriented scientific data implements a common data model called the NetCDF Java common data model, which consists of three layers built on top of each other to add successively richer semantics. === Health === Within genomic and medical data, the Observational Medical Outcomes Partnership (OMOP) research program established under the U.S. National Institutes of Health has created a common data model for claims and electronic health records which can accommodate data from different sources around the world. PCORnet, which was developed by the Patient-Centered Outcomes Research Institute, is another common data model for health data including electronic health records and patient claims. The Sentinel Common Data Model was initially started as Mini-Sentinel in 2008. It is used by the Sentinel Initiative of the USA's Food and Drug Administration. The Generalized Data Model was first published in 2019. It was designed to be a stand-alone data model as well as to allow for further transformation into other data models (e.g., OMOP, PCORNet, Sentinel). It has a hierarchical structure to flexibly capture relationships among data elements. The JANUS clinical trial data repository also provides a common data model which is based on the SDTM standard to represent clinical data submitted to regulatory agencies, such as tabulation datasets, patient profiles, listings, etc. === Logistics === SX000i is a specification developed jointly by the Aerospace and Defence Industries Association of Europe (ASD) and the American Aerospace Industries Association (AIA) to provide information, guidance and instructions to ensure compatibility and the commonality. The associated SX002D specification contains a common data model. === Microsoft Common Data Model === The Microsoft Common Data Model is a collection of many standardised extensible data schemas with entities, attributes, semantic metadata, and relationships, which represent commonly used concepts and activities in various businesses areas. It is maintained by Microsoft and its partners, and is published on GitHub. Microsoft's Common Data Model is used amongst others in Microsoft Dataverse and with various Microsoft Power Platform and Microsoft Dynamics 365 services. === Rail transport === RailTopoModel is a common data model for the railway sector. === Other === There are many more examples of various common data models for different uses published by different sources.

    Read more →
  • Stochastic parrot

    Stochastic parrot

    In machine learning, the term stochastic parrot is a metaphor that frames large language models as systems that statistically mimic text without real understanding. The word "stochastic" – from the ancient Greek "στοχαστικός" (stokhastikos, 'based on guesswork') – is a term from probability theory meaning "randomly determined". The word "parrot" refers to parrots' ability to mimic human speech. The term was introduced in a 2021 paper on AI ethics titled "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" and authored by Timnit Gebru, Emily M. Bender, Angelina McMillan-Major, and Margaret Mitchell. The paper outlined possible risks associated with large language models (LLMs). In December 2020, it was the subject of a workplace dispute between Gebru (then co-leader of Google's Ethical Artificial Intelligence Team) and Google, which had requested the retraction of the paper. The incident culminated in Gebru's controversial departure from the company. The paper was later presented at the 2021 ACM Conference, and the term "stochastic parrot" has seen widespread use in academic research concerning generative AI and LLMs. The term has been interpreted negatively as an insult towards AI. == Background == Timnit Gebru is an AI ethics researcher, Emily M. Bender is a linguist specializing in computational linguistics, and Margaret Mitchell is a computer scientist specializing in algorithmic bias. Gebru had joined Google in 2018, where she co-led a team on the ethics of artificial intelligence with Mitchell. In late 2020, the paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" was co-written by Gebru and five other researchers, four of whom were Google employees. The paper argues that large language models (LLMs) present significant risks such as environmental and financial costs, inscrutability leading to unknown dangerous biases, and potential for deception as LLMs do not understand the concepts underlying what they learn. The paper states that LLMs are "stitching together sequences of linguistic forms ... observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning." Therefore, they are labeled "stochastic parrots". === Dismissal of Gebru by Google === After the paper was submitted for consideration to the 2021 ACM Conference, Google requested that Gebru either retract the paper from the conference or remove the names of Google employees from it. Gebru refused to do so without further discussion, and emailed Google Research vice president Megan Kacholia that if the company could not explain the request for retraction and address other concerns regarding similar projects, she would plan to resign after a transition period, stating that they could "work on a last date". The following day, on December 2, 2020, Gebru received an email saying that Google was "accepting her resignation". Her abrupt firing sparked protests by Google employees and negative publicity for the company. == Usage == The phrase has been used by AI skeptics to signify that LLMs lack understanding of the meaning of their outputs. Sam Altman, CEO of OpenAI, used the term shortly after the release of ChatGPT in December 2022, tweeting "i am a stochastic parrot, and so r u". The term was nominated as the 2023 AI-related Word of the Year by the American Dialect Society. == Debate == Some LLMs, such as ChatGPT, have become capable of interacting with users in convincingly human-like conversations. The development of these new systems has deepened the discussion of the extent to which LLMs understand or are simply "parroting". According to machine learning researchers Lindholm, Wahlström, Lindsten, and Schön, the term "stochastic parrot" highlights two vital limitations of LLMs: LLMs are limited by the data they are trained on and are simply stochastically repeating contents of datasets. Because they are just making up outputs based on training data, LLMs do not understand if they are saying something incorrect or inappropriate. Lindholm et al. noted that, with poor quality datasets and other limitations, a learning machine might produce results that are "dangerously wrong". === Subjective experience === In the mind of a human being, words and language correspond to things one has experienced. For LLMs, according to proponents of the theory, words correspond only to other words and patterns of usage fed into their training data. Proponents of the idea of stochastic parrots thus conclude that statements about LLMs are due to "the human tendency to attribute meaning to text", and claim this occurs despite the LLMs not actually understanding language. === Fine-tuning === Kelsey Piper argued that the claim that LLMs are stochastic parrots or mere "next-token predictors" focuses on pre-training, ignoring that modern LLMs are also fine-tuned to follow instructions and to prefer accurate answers. === Hallucinations and mistakes === The tendency of LLMs to pass off false information as fact is held as support. Called hallucinations or confabulations, LLMs will occasionally synthesize information that matches some pattern. LLMs may fail to distinguish fact and fiction, which leads to the claim that they can't connect words to a comprehension of the world, as humans do. Furthermore, LLMs may fail to decipher complex or ambiguous grammar cases that rely on understanding the meaning of language. For example: The wet newspaper that fell down off the table is my favorite newspaper. But now that my favorite newspaper fired the editor I might not like reading it anymore. Can I replace 'my favorite newspaper' by 'the wet newspaper that fell down off the table' in the second sentence? GPT-4, an LLM released in March 2023, responded yes, not understanding that the meaning of "newspaper" is different in these two contexts; it is first an object and second an institution. === Benchmarks and experiments === One argument against the hypothesis that LLMs are stochastic parrot is their results on benchmarks for reasoning, common sense and language understanding. In 2023, some LLMs have shown good results on many language understanding tests, such as the Super General Language Understanding Evaluation (SuperGLUE). GPT-4 scored in the >90th-percentile on the Uniform Bar Examination and achieved 93% accuracy on the MATH benchmark of high-school Olympiad problems, results that exceed rote pattern-matching expectations. Such tests, and the smoothness of many LLM responses, help as many as 51% of AI professionals believe they can truly understand language with enough data, according to a 2022 survey. === Expert rebuttals === Some AI researchers dispute the notion that LLMs merely "parrot" their training data. Geoffrey Hinton, a pioneering figure in neural networks, counters that the metaphor misunderstands the prerequisite for accurate language prediction. He argues that "to predict the next word accurately, you have to understand the sentence", a view he presented on 60 Minutes in 2023. From this perspective, understanding is not an alternative to statistical prediction, but rather an emergent property required to perform it effectively at scale. Hinton also uses logical puzzles to demonstrate that LLMs actually understand language. A 2024 Scientific American investigation described a closed Berkeley workshop where state-of-the-art models solved novel tier-4 mathematics problems and produced coherent proofs, indicating reasoning abilities beyond memorization. The GPT-4 Technical Report showed human-level results on professional and academic exams (e.g., the Uniform Bar Exam and USMLE), challenging the "parrot" characterization. Anthropic conducted mechanistic interpretability research on Claude, using attribution graphs to identify circuits. The research showed how the LLM processes information via chains of fuzzy logical inference, and indicated an ability to plan ahead. They found that Claude 3.5 Haiku "employs remarkably general abstractions", forms "internally generated plans for its future outputs" and "works backwards from its longer-term goals". They noted that "The mechanisms of the model can apparently only be faithfully described using an overwhelmingly large causal graph." They also found that the model includes "mechanisms that could underlie a simple form of metacognition", in that it "thinks about" the level of its own knowledge before reaching its answer. === Interpretability === Another line of evidence against the 'stochastic parrot' claim comes from mechanistic interpretability, a research field dedicated to reverse-engineering LLMs to understand their internal workings. Rather than only observing the model's input-output behavior, these techniques probe the model's internal activations, which can be used to determine if they contain structured representations of the world. The goal is to investigate whether LLMs are merely manipulating surface statistics or if t

    Read more →
  • AdBlock

    AdBlock

    AdBlock is an ad-blocking browser extension for Google Chrome, Apple Safari (desktop and mobile), Firefox, Samsung Internet, Microsoft Edge and Opera. AdBlock allows users to prevent page elements, such as advertisements, from being displayed. It is free to download and use, and it includes optional donations to the developers. The AdBlock extension was created on December 8, 2009, which is the day that supports for extensions was added to Google Chrome. It was one of the first Google Chrome extensions that was made. Since 2016, AdBlock has been based on the Adblock Plus source code. In July 2018, AdBlock acquired uBlock, a commercial ad-blocker owned by uBlock LLC and based on uBlock Origin. In April 2021, eyeo GmbH (developer of Adblock Plus) announced its purchase of AdBlock, Inc (formerly BetaFish, Inc). == Crowdfunding == Gundlach launched a crowdfunding campaign on Crowdtilt in August 2013 in order to fund an ad campaign to raise awareness of ad-blocking and to rent a billboard at Times Square. After the one-month campaign, it raised $55,000. == Sales and acceptable ads == AdBlock was sold to an anonymous buyer in 2015 and on October 15, 2015, Gundlach's name was taken down from the site. In the terms of the deal, the original developer Michael Gundlach left operations to Adblock's continuing director, Gabriel Cubbage, and as of October 2, 2015, AdBlock began participating in the Acceptable Ads program. Acceptable Ads identifies "non-annoying" ads, which AdBlock shows by default. The intent is to allow non-invasive advertising, to either maintain support for websites that rely on advertising as a main source of revenue or for websites that have an agreement with the program. == Filters == AdBlock uses EasyList, the same filter syntax as Adblock Plus for Firefox, and natively supports the use of a number of filter lists. == Partnership with Amnesty International == On March 12, 2016, in support of World Day Against Cyber Censorship, and in partnership with Amnesty International, instead of blocking ads, AdBlock replaced ads with banners linked to articles on Amnesty's website, written by prominent free speech advocates such as Edward Snowden, to raise awareness of government-imposed online censorship and digital privacy issues around the world. The campaign was met with both praise and criticism, with AdBlock's CEO, Gabriel Cubbage, defending the decision in an essay on AdBlock's website, saying "We’re showing you Amnesty banners, just for today, because we believe users should be part of the conversation about online privacy. Tomorrow, those spaces will be vacant again. But take a moment to consider that in an increasingly information-driven world, when your right to digital privacy is threatened, so is your right to free expression." Meanwhile, Simon Sharwood of The Register characterized Cubbage's position as "'You should control your computer except when we feel political', says AdBlock CEO". == AdBlock for Firefox == On September 13, 2014, the AdBlock team released a version for Firefox users, ported from the code for Google Chrome, released under the same free software license as the original Adblock. The extension was removed on April 2, 2015, by an administrator on Mozilla Add-ons. On December 7, 2015, the official AdBlock site's knowledge base article stated that with version 44 or higher of Firefox desktop and Firefox Mobile, AdBlock will not be supported. The last version of Adblock for those platforms will work on older versions of Firefox. AdBlock was released again on Mozilla Add-ons on November 17, 2016. On April 1, 2012, Adblock developer Michael Gundlach tweaked the code to display LOLcats instead of simply blocking ads. Initially developed as a short-lived April Fools joke, the response was so positive that CatBlock was continued to be offered as an optional add-on supported by a monthly subscription. On October 23, 2014, the developer decided to end official support for CatBlock, and made it open-source, under GPLv3 licensing, as the original extension.

    Read more →
  • Phase correlation

    Phase correlation

    Phase correlation is an approach to estimate the relative translative offset between two similar images (digital image correlation) or other data sets. It is commonly used in image registration and relies on a frequency-domain representation of the data, usually calculated by fast Fourier transforms. The term is applied particularly to a subset of cross-correlation techniques that isolate the phase information from the Fourier-space representation of the cross-correlogram. == Example == The following image demonstrates the usage of phase correlation to determine relative translative movement between two images corrupted by independent Gaussian noise. The image was translated by (20,23) pixels. Accordingly, one can clearly see a peak in the phase-correlation representation at approximately (20,23). == Method == Given two input images g a {\displaystyle \ g_{a}} and g b {\displaystyle \ g_{b}} : Apply a window function (e.g., a Hamming window) on both images to reduce edge effects (this may be optional depending on the image characteristics). Then, calculate the discrete 2D Fourier transform of both images. G a = F { g a } , G b = F { g b } {\displaystyle \ \mathbf {G} _{a}={\mathcal {F}}\{g_{a}\},\;\mathbf {G} _{b}={\mathcal {F}}\{g_{b}\}} Calculate the cross-power spectrum by taking the complex conjugate of the second result, multiplying the Fourier transforms together elementwise, and normalizing this product elementwise. R = G a ∘ G b ∗ | G a ∘ G b ∗ | {\displaystyle \ R={\frac {\mathbf {G} _{a}\circ \mathbf {G} _{b}^{}}{|\mathbf {G} _{a}\circ \mathbf {G} _{b}^{}|}}} Where ∘ {\displaystyle \circ } is the Hadamard product (entry-wise product) and the absolute values are taken entry-wise as well. Written out entry-wise for element index ( j , k ) {\displaystyle (j,k)} : R j k = G a , j k ⋅ G b , j k ∗ | G a , j k ⋅ G b , j k ∗ | {\displaystyle \ R_{jk}={\frac {G_{a,jk}\cdot G_{b,jk}^{}}{|G_{a,jk}\cdot G_{b,jk}^{}|}}} Obtain the normalized cross-correlation by applying the inverse Fourier transform. r = F − 1 { R } {\displaystyle \ r={\mathcal {F}}^{-1}\{R\}} Determine the location of the peak in r {\displaystyle \ r} . ( Δ x , Δ y ) = arg ⁡ max ( x , y ) { r } {\displaystyle \ (\Delta x,\Delta y)=\arg \max _{(x,y)}\{r\}} === Subpixel registration === Commonly, interpolation methods are used to estimate the peak location in the cross-correlogram to non-integer values, despite the fact that the data are discrete, and this procedure is often termed 'subpixel registration'. A large variety of subpixel interpolation methods are given in the technical literature. Common peak interpolation methods such as parabolic interpolation have been used, and the OpenCV computer vision package uses a centroid-based method, though these generally have inferior accuracy compared to more sophisticated methods. Because the Fourier representation of the data has already been computed, it is especially convenient to use the Fourier shift theorem with real-valued (sub-integer) shifts for this purpose, which essentially interpolates using the sinusoidal basis functions of the Fourier transform. An especially popular FT-based estimator is given by Foroosh et al. In this method, the subpixel peak location is approximated by a simple formula involving peak pixel value and the values of its nearest neighbors, where r ( 0 , 0 ) {\displaystyle r_{(0,0)}} is the peak value and r ( 1 , 0 ) {\displaystyle r_{(1,0)}} is the nearest neighbor in the x direction (assuming, as in most approaches, that the integer shift has already been found and the comparand images differ only by a subpixel shift). Δ x = r ( 1 , 0 ) r ( 1 , 0 ) ± r ( 0 , 0 ) {\displaystyle \ \Delta x={\frac {r_{(1,0)}}{r_{(1,0)}\pm r_{(0,0)}}}} The Foroosh et al. method is quite fast compared to most methods, though it is not always the most accurate. Some methods shift the peak in Fourier space and apply non-linear optimization to maximize the correlogram peak, but these tend to be very slow since they must apply an inverse Fourier transform or its equivalent in the objective function. It is also possible to infer the peak location from phase characteristics in Fourier space without the inverse transformation, as noted by Stone. These methods usually use a linear least squares (LLS) fit of the phase angles to a planar model. The long latency of the phase angle computation in these methods is a disadvantage, but the speed can sometimes be comparable to the Foroosh et al. method depending on the image size. They often compare favorably in speed to the multiple iterations of extremely slow objective functions in iterative non-linear methods. Since all subpixel shift computation methods are fundamentally interpolative, the performance of a particular method depends on how well the underlying data conform to the assumptions in the interpolator. This fact also may limit the usefulness of high numerical accuracy in an algorithm, since the uncertainty due to interpolation method choice may be larger than any numerical or approximation error in the particular method. Subpixel methods are also particularly sensitive to noise in the images, and the utility of a particular algorithm is distinguished not only by its speed and accuracy but its resilience to the particular types of noise in the application. == Rationale == The method is based on the Fourier shift theorem. Let the two images g a {\displaystyle \ g_{a}} and g b {\displaystyle \ g_{b}} be circularly-shifted versions of each other: g b ( x , y ) = d e f g a ( ( x − Δ x ) mod M , ( y − Δ y ) mod N ) {\displaystyle \ g_{b}(x,y)\ {\stackrel {\mathrm {def} }{=}}\ g_{a}((x-\Delta x){\bmod {M}},(y-\Delta y){\bmod {N}})} (where the images are M × N {\displaystyle \ M\times N} in size). Then, the discrete Fourier transforms of the images will be shifted relatively in phase: G b ( u , v ) = G a ( u , v ) e − 2 π i ( u Δ x M + v Δ y N ) {\displaystyle \mathbf {G} _{b}(u,v)=\mathbf {G} _{a}(u,v)e^{-2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}} One can then calculate the normalized cross-power spectrum to factor out the phase difference: R ( u , v ) = G a G b ∗ | G a G b ∗ | = G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | = G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | G a G a ∗ | = e 2 π i ( u Δ x M + v Δ y N ) {\displaystyle {\begin{aligned}R(u,v)&={\frac {\mathbf {G} _{a}\mathbf {G} _{b}^{}}{|\mathbf {G} _{a}\mathbf {G} _{b}^{}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{}|}}\\&=e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}\end{aligned}}} since the magnitude of an imaginary exponential always is one, and the phase of G a G a ∗ {\displaystyle \ \mathbf {G} _{a}\mathbf {G} _{a}^{}} always is zero. The inverse Fourier transform of a complex exponential is a Dirac delta function, i.e. a single peak: r ( x , y ) = δ ( x + Δ x , y + Δ y ) {\displaystyle \ r(x,y)=\delta (x+\Delta x,y+\Delta y)} This result could have been obtained by calculating the cross correlation directly. The advantage of this method is that the discrete Fourier transform and its inverse can be performed using the fast Fourier transform, which is much faster than correlation for large images. === Benefits === Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects typical of medical or satellite images. The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation. === Limitations === In practice, it is more likely that g b {\displaystyle \ g_{b}} will be a simple linear shift of g a {\displaystyle \ g_{a}} , rather than a circular shift as required by the explanation above. In such cases, r {\displaystyle \ r} will not be a simple delta function, which will reduce the performance of the method. In such cases, a window function (such as a Gaussian or Tukey window) should be employed during the Fourier transform to reduce edge effects, or the images should be zero padded so that the edge effects can be ignored. If the images consist of a flat background, with all detail situated away from the edges, then a linear shift will be equivalent to a circular shift, and the above derivation will hold exactly. The peak can be sharpened by using edge or vector correlation. For periodic images (such as a chessboard or picket fence), phase correlation may yield ambiguous results with several peaks in the resulting output. == Applications == Phase correlation is the preferred m

    Read more →
  • Braina

    Braina

    Braina is a virtual assistant and speech-to-text dictation application for Microsoft Windows developed by Brainasoft. Braina uses natural language interface, speech synthesis, and speech recognition technology to interact with its users and allows them to use natural language sentences to perform various tasks on a computer. The name Braina is a short form of "Brain Artificial". Braina is marketed as a Microsoft Copilot alternative. It provides a voice interface for several locally run and cloud large language models, including the latest LLMs from providers such as OpenAI, Anthropic, Google, xAI, Meta, Mistral, etc; while improving data privacy. Braina also allows responses from its in-house large language models like Braina Swift and Braina Pinnacle. It has an "Artificial Brain" feature that provides persistent memory support for supported LLMs. == Features == Braina provides is able to carry out various tasks on a computer, including automation. Braina can take commands inputted through typing or through dictation to store reminders, find information online, perform mathematical operations, open files, generate images from text, transcribe speech, and control open windows or programs. Braina adapts to user behavior over time with a goal of better anticipating needs. === Speech-to-text dictation === Braina Pro can type spoken words into an active window at the location of a user's cursor. Its speech recognition technology supports more than 100 languages and dialects and is able to isolate the recognition of a user's voice from disturbing environmental factors such as background noise, other human voices, or external devices. Braina can also be taught to dictate uncommon legal, medical, and scientific terms. Users can also teach Braina uncommon names and vocabulary. Users can edit or correct dictated text without using a keyboard or mouse by giving built-in voice commands. === Text-to-speech === Braina can read aloud selected texts, such as e-books. === Custom commands and automation === Braina can automate computer tasks. It lets users create custom voice commands to perform tasks such as opening files, programs, websites, or emails, as well as executing keyboard or mouse macros. === Transcription === Braina can transcribe media file formats such as WAV, MP3, and MP4 into text. === Notes and reminders === Braina can store and recall notes and reminders. These can include scheduled or unscheduled commands, checklist items, alarms, chat conversations, memos, website snippets, bookmarks, contacts. === Image and Video generation === Braina can generate AI images and videos from text and image inputs using generative cloud AI models. These include Black Forest Labs' FLUX.2, Google's Veo, Imagen, and Nano Banana Pro, Kuaishou's Kling, Alibaba's Wan, ByteDance's Seedance and Seedream, MiniMax's Hailuo, OpenAI's GPT Image, and Tongyi Lab's Z Image Turbo. == Platforms == In addition to the desktop version for Windows operating systems, Braina is also available for the iOS and Android operating systems. The mobile version of Braina has a feature allowing remote management of a Windows PC connected via Wi-Fi. == Distributions == Braina is distributed in multiple modes. These include Braina Lite, a freeware version with limitations, and premium versions Braina Pro, Pro Plus, and Pro Ultra. Some additional features in the Pro version include dictation, custom vocabulary, video transcription, automation, custom voice commands, and persistent LLM memory. == Reception == TechRadar has consistently listed Braina as one of the best dictation and virtual assistant apps between 2015 and 2024.

    Read more →
  • XLNet

    XLNet

    The XLNet was an autoregressive Transformer designed as an improvement over BERT, with 340M parameters and trained on 33 billion words. It was released on 19 June 2019, under the Apache 2.0 license. It achieved state-of-the-art results on a variety of natural language processing tasks, including language modeling, question answering, and natural language inference. == Architecture == The main idea of XLNet is to model language autoregressively like the GPT models, but allow for all possible permutations of a sentence. Concretely, consider the following sentence:My dog is cute.In standard autoregressive language modeling, the model would be tasked with predicting the probability of each word, conditioned on the previous words as its context: We factorize the joint probability of a sequence of words x 1 , … , x T {\displaystyle x_{1},\ldots ,x_{T}} using the chain rule: Pr ( x 1 , … , x T ) = Pr ( x 1 ) Pr ( x 2 | x 1 ) Pr ( x 3 | x 1 , x 2 ) … Pr ( x T | x 1 , … , x T − 1 ) . {\displaystyle \Pr(x_{1},\ldots ,x_{T})=\Pr(x_{1})\Pr(x_{2}|x_{1})\Pr(x_{3}|x_{1},x_{2})\ldots \Pr(x_{T}|x_{1},\ldots ,x_{T-1}).} For example, the sentence "My dog is cute" is factorized as: Pr ( My , dog , is , cute ) = Pr ( My ) Pr ( dog | My ) Pr ( is | My , dog ) Pr ( cute | My , dog , is ) . {\displaystyle \Pr({\text{My}},{\text{dog}},{\text{is}},{\text{cute}})=\Pr({\text{My}})\Pr({\text{dog}}|{\text{My}})\Pr({\text{is}}|{\text{My}},{\text{dog}})\Pr({\text{cute}}|{\text{My}},{\text{dog}},{\text{is}}).} Schematically, we can write it as → My → My dog → My dog is → My dog is cute . {\displaystyle {\texttt {}}{\texttt {}}{\texttt {}}{\texttt {}}\to {\text{My }}{\texttt {}}{\texttt {}}{\texttt {}}\to {\text{My dog }}{\texttt {}}{\texttt {}}\to {\text{My dog is }}{\texttt {}}\to {\text{My dog is cute}}.} However, for XLNet, the model is required to predict the words in a randomly generated order. Suppose we have sampled a randomly generated order 3241, then schematically, the model is required to perform the following prediction task: is dog is dog is cute → My dog is cute {\displaystyle {\texttt {}}{\texttt {}}{\texttt {}}{\texttt {}}\to {\texttt {}}{\texttt {}}{\text{is }}{\texttt {}}\to {\texttt {}}{\text{dog is }}{\texttt {}}\to {\texttt {}}{\text{dog is cute}}\to {\text{My dog is cute}}} By considering all permutations, XLNet is able to capture longer-range dependencies and better model the bidirectional context of words. === Two-Stream Self-Attention === To implement permutation language modeling, XLNet uses a two-stream self-attention mechanism. The two streams are: Content stream: This stream encodes the content of each word, as in standard causally masked self-attention. Query stream: This stream encodes the content of each word in the context of what has gone before. In more detail, it is a masked cross-attention mechanism, where the queries are from the query stream, and the key-value pairs are from the content stream. The content stream uses the causal mask M causal = [ 0 − ∞ − ∞ … − ∞ 0 0 − ∞ … − ∞ 0 0 0 … − ∞ ⋮ ⋮ ⋮ ⋱ ⋮ 0 0 0 … 0 ] {\displaystyle M_{\text{causal}}={\begin{bmatrix}0&-\infty &-\infty &\dots &-\infty \\0&0&-\infty &\dots &-\infty \\0&0&0&\dots &-\infty \\\vdots &\vdots &\vdots &\ddots &\vdots \\0&0&0&\dots &0\end{bmatrix}}} permuted by a random permutation matrix to P M causal P − 1 {\displaystyle PM_{\text{causal}}P^{-1}} . The query stream uses the cross-attention mask P ( M causal − ∞ I ) P − 1 {\displaystyle P(M_{\text{causal}}-\infty I)P^{-1}} , where the diagonal is subtracted away specifically to avoid the model "cheating" by looking at the content stream for what the current masked token is. Like the causal masking for GPT models, this two-stream masked architecture allows the model to train on all tokens in one forward pass. == Training == Two models were released: XLNet-Large, cased: 110M parameters, 24-layer, 1024-hidden, 16-heads XLNet-Base, cased: 340M parameters, 12-layer, 768-hidden, 12-heads. It was trained on a dataset that amounted to 32.89 billion tokens after tokenization with SentencePiece. The dataset was composed of BooksCorpus, and English Wikipedia, Giga5, ClueWeb 2012-B, and Common Crawl. It was trained on 512 TPU v3 chips, for 5.5 days. At the end of training, it still under-fitted the data, meaning it could have achieved lower loss with more training. It took 0.5 million steps with an Adam optimizer, linear learning rate decay, and a batch size of 8192.

    Read more →
  • North Atlantic Population Project

    North Atlantic Population Project

    The North Atlantic Population Project (NAPP) is a collaboration of historical demographers in Britain, Canada, Denmark, Germany, Iceland, Norway, and Sweden to produce a massive census microdata collection for the North Atlantic Region in the late-nineteenth century. The database includes complete individual-level census enumerations for each country, and provides information on over 110 million people. This large scale allows detailed analysis of small geographic areas and population subgroups. The NAPP database is designed to be compatible with the Integrated Public Use Microdata Series (IPUMS), and is disseminated through the IPUMS data-access system at the Minnesota Population Center, University of Minnesota. Major collaborators on the project include Lisa Dillon, University of Montreal; Chad Gaffield, University of Ottawa; Ólöf Garðarsdóttir, Statistics Iceland; Marianne Jarnes Erikstad, University of Tromsø; Jan Oldervall University of Bergen; Evan Roberts, University of Minnesota; Steven Ruggles, University of Minnesota; Kevin Schürer, UK Data Archive; Gunnar Thorvaldsen, University of Tromsø; and Matthew Woollard, UK Data Archive. The project is also coordinated by the Minnesota Population Center at the University of Minnesota.

    Read more →
  • Graph cut optimization

    Graph cut optimization

    Graph cut optimization is a combinatorial optimization method applicable to a family of functions of discrete variables, named after the concept of cut in the theory of flow networks. Thanks to the max-flow min-cut theorem, determining the minimum cut over a graph representing a flow network is equivalent to computing the maximum flow over the network. Given a pseudo-Boolean function f {\displaystyle f} , if it is possible to construct a flow network with positive weights such that each cut C {\displaystyle C} of the network can be mapped to an assignment of variables x {\displaystyle \mathbf {x} } to f {\displaystyle f} (and vice versa), and the cost of C {\displaystyle C} equals f ( x ) {\displaystyle f(\mathbf {x} )} (up to an additive constant) then it is possible to find the global optimum of f {\displaystyle f} in polynomial time by computing a minimum cut of the graph. The mapping between cuts and variable assignments is done by representing each variable with one node in the graph and, given a cut, each variable will have a value of 0 if the corresponding node belongs to the component connected to the source, or 1 if it belong to the component connected to the sink. Not all pseudo-Boolean functions can be represented by a flow network, and in the general case the global optimization problem is NP-hard. There exist sufficient conditions to characterise families of functions that can be optimised through graph cuts, such as submodular quadratic functions. Graph cut optimization can be extended to functions of discrete variables with a finite number of values, that can be approached with iterative algorithms with strong optimality properties, computing one graph cut at each iteration. Graph cut optimization is an important tool for inference over graphical models such as Markov random fields or conditional random fields, and it has applications in computer vision problems such as image segmentation, denoising, registration and stereo matching. == Representability == A pseudo-Boolean function f : { 0 , 1 } n → R {\displaystyle f:\{0,1\}^{n}\to \mathbb {R} } is said to be representable if there exists a graph G = ( V , E ) {\displaystyle G=(V,E)} with non-negative weights and with source and sink nodes s {\displaystyle s} and t {\displaystyle t} respectively, and there exists a set of nodes V 0 = { v 1 , … , v n } ⊂ V − { s , t } {\displaystyle V_{0}=\{v_{1},\dots ,v_{n}\}\subset V-\{s,t\}} such that, for each tuple of values ( x 1 , … , x n ) ∈ { 0 , 1 } n {\displaystyle (x_{1},\dots ,x_{n})\in \{0,1\}^{n}} assigned to the variables, f ( x 1 , … , x n ) {\displaystyle f(x_{1},\dots ,x_{n})} equals (up to a constant) the value of the flow determined by a minimum cut C = ( S , T ) {\displaystyle C=(S,T)} of the graph G {\displaystyle G} such that v i ∈ S {\displaystyle v_{i}\in S} if x i = 0 {\displaystyle x_{i}=0} and v i ∈ T {\displaystyle v_{i}\in T} if x i = 1 {\displaystyle x_{i}=1} . It is possible to classify pseudo-Boolean functions according to their order, determined by the maximum number of variables contributing to each single term. All first order functions, where each term depends upon at most one variable, are always representable. Quadratic functions f ( x ) = w 0 + ∑ i w i ( x i ) + ∑ i < j w i j ( x i , x j ) . {\displaystyle f(\mathbf {x} )=w_{0}+\sum _{i}w_{i}(x_{i})+\sum _{i 0 {\displaystyle p>0} then w i j k ( x i , x j , x k ) = w i j k ( 0 , 0 , 0 ) + p 1 ( x i − 1 ) + p 2 ( x j − 1 ) + p 3 ( x k − 1 ) + p 23 ( x j − 1 ) x k + p 31 x i ( x k − 1 ) + p 12 ( x i − 1 ) x j − p x i x j x k {\displaystyle w_{ijk}(x_{i},x_{j},x_{k})=w_{ijk}(0,0,0)+p_{1}(x_{i}-1)+p_{2}(x_{j}-1)+p_{3}(x_{k}-1)+p_{23}(x_{j}-1)x_{k}+p_{31}x_{i}(x_{k}-1)+p_{12}(x_{i}-1)x_{j}-px_{i}x_{j}x_{k}} with p 1 = w i j k ( 1 , 0 , 1 ) − w i j k ( 0 , 0 , 1 ) p 2 = w i j k ( 1 , 1 , 0 ) − w i j k ( 1 , 0 , 1 ) p 3 = w i j k ( 0 , 1 , 1 ) − w i j k ( 0 , 1 , 0 ) p 23 = w i j k ( 0 , 0 , 1 ) + w i j k ( 0 , 1 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 0 , 1 , 1 ) p 31 = w i j k ( 0 , 0 , 1 ) + w i j k ( 1 , 0 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 1 , 0 , 1 ) p 12 = w i j k ( 0 , 1 , 0 ) + w i j k ( 1 , 0 , 0 ) − w i j k ( 0 , 0 , 0 ) − w i j k ( 1 , 1 , 0 ) . {\displaystyle {\begin{aligned}p_{1}&=w_{ijk}(1,0,1)-w_{ijk}(0,0,1)\\p_{2}&=w_{ijk}(1,1,0)-w_{ijk}(1,0,1)\\p_{3}&=w_{ijk}(0,1,1)-w_{ijk}(0,1,0)\\p_{23}&=w_{ijk}(0,0,1)+w_{ijk}(0,1,0)-w_{ijk}(0,0,0)-w_{ijk}(0,1,1)\\p_{31}&=w_{ijk}(0,0,1)+w_{ijk}(1,0,0)-w_{ijk}(0,0,0)-w_{ijk}(1,0,1)\\p_{12}&=w_{ijk}(0,1,0)+w_{ijk}(1,0,0)-w_{ijk}(0,0,0)-w_{ijk}(1,1

    Read more →
  • Vector database

    Vector database

    A vector database, vector store or vector search engine is a database that stores and retrieves embeddings of data in vector space. Vector databases typically implement approximate nearest neighbor algorithms so users can search for records semantically similar to a given input, unlike traditional databases which primarily look up records by exact match. Use-cases for vector databases include similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation (RAG). Vector embeddings are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, with the number of dimensions ranging from a few hundred to tens of thousands, depending on the complexity of the data being represented. Each data item is represented by one vector in this space. Words, phrases, or entire documents, as well as images, audio, and other types of data, can all be vectorized. These feature vectors may be computed from the raw data using machine learning methods such as feature extraction algorithms, word embeddings or deep learning networks. The goal is that semantically similar data items receive feature vectors close to each other. Vector retrieval can be combined with metadata filtering or lexical search to support filtered and hybrid retrieval workflows. == Techniques == Common techniques for similarity search on high-dimensional vectors include: Hierarchical Navigable Small World (HNSW) graphs Locality-sensitive hashing (LSH) and sketching Product quantization (PQ) Inverted files These techniques may also be combined in vector search systems. In recent benchmarks, HNSW-based implementations have been among the best performers. Conferences such as the International Conference on Similarity Search and Applications (SISAP) and the Conference on Neural Information Processing Systems (NeurIPS) have hosted competitions on vector search in large databases. == Applications == Vector databases are used in a wide range of machine learning applications including similarity search, semantic search, multi-modal search, recommendations engines, object detection, and retrieval-augmented generation. === Retrieval-augmented generation === An especially common use-case for vector databases is in retrieval-augmented generation (RAG), a method to improve domain-specific responses of large language models. The retrieval component of a RAG can be any search system, but is most often implemented as a vector database. Text documents describing the domain of interest are collected, and for each document or document section, a feature vector (known as an "embedding") is computed, typically using a deep learning network, and stored in a vector database along with a link to the document. Given a user prompt, the feature vector of the prompt is computed, and the database is queried to retrieve the most relevant documents. These are then automatically added into the context window of the large language model, and the large language model proceeds to create a response to the prompt given this context. == Implementations ==

    Read more →
  • Latent semantic analysis

    Latent semantic analysis

    Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text (the distributional hypothesis). A matrix containing word counts per document (rows represent unique words and columns represent each document) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents. An information retrieval technique using latent semantic structure was patented in 1988 by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter. In the context of its application to information retrieval, it is sometimes called latent semantic indexing (LSI). == Overview == === Occurrence matrix === LSA can use a document-term matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to terms and whose columns correspond to documents. A typical example of the weighting of the elements of the matrix is tf-idf (term frequency–inverse document frequency): the weight of an element of the matrix is proportional to the number of times the terms appear in each document, where rare terms are upweighted to reflect their relative importance. This matrix is also common to standard semantic models, though it is not necessarily explicitly expressed as a matrix, since the mathematical properties of matrices are not always used. === Rank lowering === After the construction of the occurrence matrix, LSA finds a low-rank approximation to the term-document matrix. There could be various reasons for these approximations: The original term-document matrix is presumed too large for the computing resources; in this case, the approximated low rank matrix is interpreted as an approximation (a "least and necessary evil"). The original term-document matrix is presumed noisy: for example, anecdotal instances of terms are to be eliminated. From this point of view, the approximated matrix is interpreted as a de-noisified matrix (a better matrix than the original). The original term-document matrix is presumed overly sparse relative to the "true" term-document matrix. That is, the original matrix lists only the words actually in each document, whereas we might be interested in all words related to each document—generally a much larger set due to synonymy. The consequence of the rank lowering is that some dimensions are combined and depend on more than one term: {(car), (truck), (flower)} → {(1.3452 car + 0.2828 truck), (flower)} This mitigates the problem of identifying synonymy, as the rank lowering is expected to merge the dimensions associated with terms that have similar meanings. It also partially mitigates the problem with polysemy, since components of polysemous words that point in the "right" direction are added to the components of words that share a similar meaning. Conversely, components that point in other directions tend to either simply cancel out, or, at worst, to be smaller than components in the directions corresponding to the intended sense. === Derivation === Let X {\displaystyle X} be a matrix where element ( i , j ) {\displaystyle (i,j)} describes the occurrence of term i {\displaystyle i} in document j {\displaystyle j} (this can be, for example, the frequency). X {\displaystyle X} will look like this: d j ↓ t i T → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] {\displaystyle {\begin{matrix}&{\textbf {d}}_{j}\\&\downarrow \\{\textbf {t}}_{i}^{T}\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}\end{matrix}}} Now a row in this matrix will be a vector corresponding to a term, giving its relation to each document: t i T = [ x i , 1 … x i , j … x i , n ] {\displaystyle {\textbf {t}}_{i}^{T}={\begin{bmatrix}x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\end{bmatrix}}} Likewise, a column in this matrix will be a vector corresponding to a document, giving its relation to each term: d j = [ x 1 , j ⋮ x i , j ⋮ x m , j ] {\displaystyle {\textbf {d}}_{j}={\begin{bmatrix}x_{1,j}\\\vdots \\x_{i,j}\\\vdots \\x_{m,j}\\\end{bmatrix}}} Now the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} between two term vectors gives the correlation between the terms over the set of documents. The matrix product X X T {\displaystyle XX^{T}} contains all these dot products. Element ( i , p ) {\displaystyle (i,p)} (which is equal to element ( p , i ) {\displaystyle (p,i)} ) contains the dot product t i T t p {\displaystyle {\textbf {t}}_{i}^{T}{\textbf {t}}_{p}} ( = t p T t i {\displaystyle ={\textbf {t}}_{p}^{T}{\textbf {t}}_{i}} ). Likewise, the matrix X T X {\displaystyle X^{T}X} contains the dot products between all the document vectors, giving their correlation over the terms: d j T d q = d q T d j {\displaystyle {\textbf {d}}_{j}^{T}{\textbf {d}}_{q}={\textbf {d}}_{q}^{T}{\textbf {d}}_{j}} . Now, from the theory of linear algebra, there exists a decomposition of X {\displaystyle X} such that U {\displaystyle U} and V {\displaystyle V} are orthogonal matrices and Σ {\displaystyle \Sigma } is a diagonal matrix. This is called a singular value decomposition (SVD): X = U Σ V T {\displaystyle {\begin{matrix}X=U\Sigma V^{T}\end{matrix}}} The matrix products giving us the term and document correlations then become X X T = ( U Σ V T ) ( U Σ V T ) T = ( U Σ V T ) ( V T T Σ T U T ) = U Σ V T V Σ T U T = U Σ Σ T U T X T X = ( U Σ V T ) T ( U Σ V T ) = ( V T T Σ T U T ) ( U Σ V T ) = V Σ T U T U Σ V T = V Σ T Σ V T {\displaystyle {\begin{matrix}XX^{T}&=&(U\Sigma V^{T})(U\Sigma V^{T})^{T}=(U\Sigma V^{T})(V^{T^{T}}\Sigma ^{T}U^{T})=U\Sigma V^{T}V\Sigma ^{T}U^{T}=U\Sigma \Sigma ^{T}U^{T}\\X^{T}X&=&(U\Sigma V^{T})^{T}(U\Sigma V^{T})=(V^{T^{T}}\Sigma ^{T}U^{T})(U\Sigma V^{T})=V\Sigma ^{T}U^{T}U\Sigma V^{T}=V\Sigma ^{T}\Sigma V^{T}\end{matrix}}} Since Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} and Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } are diagonal we see that U {\displaystyle U} must contain the eigenvectors of X X T {\displaystyle XX^{T}} , while V {\displaystyle V} must be the eigenvectors of X T X {\displaystyle X^{T}X} . Both products have the same non-zero eigenvalues, given by the non-zero entries of Σ Σ T {\displaystyle \Sigma \Sigma ^{T}} , or equally, by the non-zero entries of Σ T Σ {\displaystyle \Sigma ^{T}\Sigma } . Now the decomposition looks like this: X U Σ V T ( d j ) ( d ^ j ) ↓ ↓ ( t i T ) → [ x 1 , 1 … x 1 , j … x 1 , n ⋮ ⋱ ⋮ ⋱ ⋮ x i , 1 … x i , j … x i , n ⋮ ⋱ ⋮ ⋱ ⋮ x m , 1 … x m , j … x m , n ] = ( t ^ i T ) → [ [ u 1 ] … [ u l ] ] ⋅ [ σ 1 … 0 ⋮ ⋱ ⋮ 0 … σ l ] ⋅ [ [ v 1 ] ⋮ [ v l ] ] {\displaystyle {\begin{matrix}&X&&&U&&\Sigma &&V^{T}\\&({\textbf {d}}_{j})&&&&&&&({\hat {\textbf {d}}}_{j})\\&\downarrow &&&&&&&\downarrow \\({\textbf {t}}_{i}^{T})\rightarrow &{\begin{bmatrix}x_{1,1}&\dots &x_{1,j}&\dots &x_{1,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{i,1}&\dots &x_{i,j}&\dots &x_{i,n}\\\vdots &\ddots &\vdots &\ddots &\vdots \\x_{m,1}&\dots &x_{m,j}&\dots &x_{m,n}\\\end{bmatrix}}&=&({\hat {\textbf {t}}}_{i}^{T})\rightarrow &{\begin{bmatrix}{\begin{bmatrix}\,\\\,\\{\textbf {u}}_{1}\\\,\\\,\end{bmatrix}}\dots {\begin{bmatrix}\,\\\,\\{\textbf {u}}_{l}\\\,\\\,\end{bmatrix}}\end{bmatrix}}&\cdot &{\begin{bmatrix}\sigma _{1}&\dots &0\\\vdots &\ddots &\vdots \\0&\dots &\sigma _{l}\\\end{bmatrix}}&\cdot &{\begin{bmatrix}{\begin{bmatrix}&&{\textbf {v}}_{1}&&\end{bmatrix}}\\\vdots \\{\begin{bmatrix}&&{\textbf {v}}_{l}&&\end{bmatrix}}\end{bmatrix}}\end{matrix}}} The values σ 1 , … , σ l {\displaystyle \sigma _{1},\dots ,\sigma _{l}} are called the singular values, and u 1 , … , u l {\displaystyle u_{1},\dots ,u_{l}} and v 1 , … , v l {\displaystyle v_{1},\dots ,v_{l}} the left and right singular vectors. Notice the only part of U {\displaystyle U} that contributes to t i {\displaystyle {\textbf {t}}_{i}} is the i 'th {\displaystyle i{\textrm {'th}}} row. Let this row vector be called t ^ i T {\displaystyle {\hat {\textrm {t}}}_{i}^{T}} . Likewise, the only part of V T {\displaystyle V^{T}} that contributes to d j {\displaystyle {\textbf {d}}_{j}} is the j 'th {\displaystyle j{\textrm {'th}}} column, d ^ j {\displaystyle {\hat {\textrm {d}}}_{j}} . These are not the eigenvectors, but depend on all the eigenvectors. I

    Read more →
  • Depth peeling

    Depth peeling

    In computer graphics, depth peeling is an exact multipass method of order-independent transparency that extracts transparent fragments into depth layers and composites those layers in depth order. Depth peeling has the advantage of being able to generate correct results even for complex images containing intersecting transparent objects. == Method == Depth peeling works by rendering the image multiple times. Depth peeling uses two Z buffers, one that works conventionally, and one that is not modified, and sets the minimum distance at which a fragment can be drawn without being discarded. For each pass, the previous pass' conventional Z-buffer is used as the minimal Z-buffer, so each pass removes already-captured nearer fragments and draws the next depth layer behind them. The resulting images can then be composited in depth order to form a single image. A major drawback of classical depth peeling is performance: it requires one geometry pass per peeled layer, so scenes with high depth complexity require many passes that each re-rasterize the transparent geometry. Later variants reduce the number of passes by peeling multiple layers or both front and back layers in a pass. Dual depth peeling reduces the geometry-pass count from N to N/2+1 by peeling one layer from the front and one from the back in each pass, while multi-layer depth peeling peels several layers per pass and reported up to an 8x speed-up in RGBA8 settings.

    Read more →
  • Open Syllabus Project

    Open Syllabus Project

    The Open Syllabus Project (OSP) is an online open-source platform that catalogs and analyzes millions of college syllabi. Founded by researchers from the American Assembly at Columbia University, the OSP has amassed the most extensive collection of searchable syllabi. Since its beta launch in 2016, the OSP has collected over 7 million course syllabi from over 80 countries, primarily by scraping publicly accessible university websites. The project is directed by Joe Karaganis. == History == The OSP was formed by a group of data scientists, sociologists, and digital-humanities researchers at the American Assembly, a public-policy institute based at Columbia University. The OSP was partly funded by the Sloan Foundation and the Arcadia Fund. Joe Karaganis, former vice-president of the American Assembly, serves as the project director of the OSP. The project builds on prior attempts to archive syllabi, such as H-Net, MIT OpenCourseWare, and historian Dan Cohen's defunct Syllabus Finder website (Cohen now sits on the OSP's advisory board). The OSP became a non-profit and independent of the American Assembly in November 2019. In January 2016, the OSP launched a beta version of their "Syllabus Explorer," which they had collected data for since 2013. The Syllabus Explorer allows users to browse and search texts from over one million college course syllabi. The OSP launched a more comprehensive version 2.0 of the Syllabus Explorer in July 2019. The newer version includes an interactive visualization that displays texts as dots on a knowledge map. As of 2022, the OSP has collected over 7 million course syllabi. The Syllabus Explorer represents the "largest collection of searchable syllabi ever amassed." == Methodology == The OSP has collected syllabi data from over 80 countries dating to 2000. The syllabi stem from over 4,000 worldwide institutions. Most of the OSP's data originates from the United States. Canada, Australia, and the U.K also have large datasets. The OSP primarily collects syllabi by scraping publicly accessible university websites. The OSP also allows syllabi submissions from faculty, students, and administrators. The OSP developers use machine learning and natural language processing to extract metadata from such syllabi. Since only metadata is collected, no individual syllabus or personal identifying information is found in the OSP database. The OSP classifies the syllabi into 62 subject fields – corresponding to the U.S. Department of Education's Classification of Instructional Programs (CIP). Additionally, the OSP assigns each text a "teaching score" from 0–100. This score represents the text's percentile rank among citations in the total citation count and is a numerical indicator of the relative frequency of which a particular work is taught. The OSP also has data on which texts are most likely to be assigned together. The developers behind the OSP admit that the database is incomplete and likely contains "a fair number of errors." Karaganis estimates that 80–100 million syllabi exist in the United States alone. The OSP is unable to access syllabi behind private course-management software like Blackboard. == Notable findings == === Anthropology === Using data from the OSP, anthropologist Laurence Ralph uncovered that black anthropologists are "woefully under-represented in (if not erased from) most anthropology syllabi." Black authors wrote less than 1 percent of the top 1,000 assigned works. === Economics === The database indicates Greg Mankiw is the most frequently cited author for college economics courses. === English literature === The OSP found that Mary Shelley's Frankenstein was the most widely taught novel in college courses. Additionally, the majority of novels published after 1945 taught in English classes were historical fiction. === Female writers === The most read female writer on college campuses is Kate L. Turabian for her A Manual for Writers of Research Papers, Theses, and Dissertations . Turabian is followed by Diana Hacker, Toni Morrison, Jane Austen, and Virginia Woolf. === Film === The most assigned film according to the OSP is the 1929 Soviet documentary film, Man with a Movie Camera. English filmmaker Alfred Hitchcock is the most assigned director in college courses. === History === Historians George Brown Tindall and David Emory Shi's America: A Narrative History is the number one assigned textbook for history, followed by Anne Moody's memoir, Coming of Age in Mississippi. === Philosophy === The most assigned texts in the field of philosophy include Aristotle's Nicomachean Ethics, John Stuart Mill's Utilitarianism, and Plato's Republic. Plato's Republic was also the second most assigned text in universities in the English-speaking world (only behind Strunk and White's Elements of Style). === Physics === David Halliday's et al. Fundamentals of Physics is the number one ranked physics textbook in the OSP's database. === Political science === Data from the OSP indicates that the dominant political science texts are written almost exclusively by white men and scholars based in the West. In the top 200 most-frequently assigned works, 15 are authored by at least one woman. === Public administration === American president Woodrow Wilson's article "The Study of Administration" was the most frequently assigned text in public affairs and administration syllabi. == Reception == According to William Germano et al., the OSP is a "fascinating resource but is also prone to misrepresenting or at least distracting us from the most important business of a syllabus: communicating with students." Historian William Caferro remarks that the OSP is a "tacit experience of sharing, but a useful one." English professor Bart Beaty writes that, "Despite the many reservations about the completeness of its data, the OSP provides a rare opportunity for scholars to move beyond the anecdotal in discussions of canon-formation in teaching." Media theorist Elizabeth Losh opines that "big data approaches", like the OSP, may "raise troubling questions for instructors about informed consent, pedagogical privacy, and quantified metrics."

    Read more →
  • Example-based machine translation

    Example-based machine translation

    Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be viewed as an implementation of a case-based reasoning approach to machine learning. == Translation by analogy == At the foundation of example-based machine translation is the idea of translation by analogy. When applied to the process of human translation, the idea that translation takes place by analogy is a rejection of the idea that people translate sentences by doing deep linguistic analysis. Instead, it is founded on the belief that people translate by first decomposing a sentence into certain phrases, then by translating these phrases, and finally by properly composing these fragments into one long sentence. Phrasal translations are translated by analogy to previous translations. The principle of translation by analogy is encoded to example-based machine translation through the example translations that are used to train such a system. Other approaches to machine translation, including statistical machine translation, also use bilingual corpora to learn the process of translation. == History == Example-based machine translation was first suggested by Makoto Nagao in 1984. He pointed out that it is especially adapted to translation between two totally different languages, such as English and Japanese. In this case, one sentence can be translated into several well-structured sentences in another language, therefore, it is no use to do the deep linguistic analysis characteristic of rule-based machine translation. == Example == Example-based machine translation systems are trained from bilingual parallel corpora containing sentence pairs like the example shown in the table above. Sentence pairs contain sentences in one language with their translations into another. The particular example shows an example of a minimal pair, meaning that the sentences vary by just one element. These sentences make it simple to learn translations of portions of a sentence. For example, an example-based machine translation system would learn three units of translation from the above example: How much is that X ? corresponds to Ano X wa ikura desu ka. red umbrella corresponds to akai kasa small camera corresponds to chiisai kamera Composing these units can be used to produce novel translations in the future. For example, if we have been trained using some text containing the sentences: President Kennedy was shot dead during the parade. and The convict escaped on July 15th., then we could translate the sentence The convict was shot dead during the parade. by substituting the appropriate parts of the sentences. == Phrasal verbs == Example-based machine translation is best suited for sub-language phenomena like phrasal verbs. Phrasal verbs have highly context-dependent meanings. They are common in English, where they comprise a verb followed by an adverb and/or a preposition, which are called the particle to the verb. Phrasal verbs produce specialized context-specific meanings that may not be derived from the meaning of the constituents. There is almost always an ambiguity during word-to-word translation from source to the target language. As an example, consider the phrasal verb "put on" and its Hindustani translation. It may be used in any of the following ways: Ram put on the lights. (Switched on) (Hindustani translation: Jalana) Ram put on a cap. (Wear) (Hindustani translation: Pahenna)

    Read more →