PowerBuilder is an integrated development environment owned by SAP since the acquisition of Sybase in 2010. On July 5, 2016, SAP and Appeon entered into an agreement whereby Appeon, an independent company, would be responsible for developing, selling, and supporting PowerBuilder. Over the years, PowerBuilder has been updated with new standards. In 2010, a major upgrade of PowerBuilder was released to provide support for the Microsoft .NET Framework. In 2014, support was added for OData, dockable windows, and 64-bit native applications. In 2019 support was added for rapidly creating RESTful Web APIs and non-visual .NET assemblies using the C# language and the .NET Core framework. And PowerScript client app development was revamped with new UI technologies and cloud architecture. In 2025 the IDE was revamped with new code editor and ultra-fast compiler. Appeon has been releasing new features every 6-12 month cycles, which per the product roadmap focus on four key focus areas: sustaining core features, modernizing application UI, improving developer productivity, and incorporating more Cloud technology. == Features == PowerBuilder has a native data-handling component called a DataWindow, which can be used to create, edit, and display data from a database. This object gives the programmer a number of tools for specifying and controlling user interface appearance and behavior, and also provides simplified access to database content and JSON or XML from Web services. To some extent, the DataWindow frees the programmer from considering the differences between Database Management Systems from different vendors. DataWindow can display data using multiple presentation styles and can connect to various data sources. == Usage == PowerBuilder is used primarily for building business-oriented CRUD applications. Although new software products are rarely built with PowerBuilder, many client-server ERP products and line-of-business applications built in the late 1980s to early 2000s with PowerBuilder still provide core database functions for large enterprises in government, higher education, manufacturing, insurance, banking, energy, and telecommunications. == History == === Early history === PowerBuilder originated from Computer Solutions Inc. (CSI), a software consulting firm founded in 1974 by Mitchell Kertzman in Massachusetts. CSI developed GrowthPower, an MRP II software package with integrated financial modules released in 1981, which ran exclusively on the HP 3000 platform and achieved over 1,000 customer installations at its peak. In the late 1980s, as demand increased for graphical user interfaces amid the rise of Microsoft Windows, Kertzman partnered with Dave Litwack, former executive vice president of product development at Cullinet Software (acquired by Computer Associates in 1989). Litwack joined the company in 1988 as head of research and development to develop a client/server GUI tool, leading to its rebranding as Powersoft Corporation in 1990. PowerBuilder 1.0 was released in July 1991 as a rapid application development tool featuring the DataWindow and PowerScript language. Powersoft went public on February 3, 1993, with shares closing at $38 from an initial $20 price. Sybase announced its acquisition of Powersoft on November 15, 1994, in a stock swap valued at approximately $940 million; the merger closed on February 14, 1995, at a revised value of about $904 million due to Sybase's stock fluctuations. === Recent history === In December 2013 SAP announced the new version going directly to number 15 and released a beta version. Key features included support for the .NET Framework v4.5, SQL Server 2012, Oracle 12, Windows 8, OData and Dockable Windows. SAP later released this as version 12.6. On May 31, 2019, PowerBuilder 2019 was released by Appeon. This release supports C# development. It provides a new C# IDE, .NET data access objects, C# migration solution, Web API client, and UI themes. On April 3, 2020, PowerBuilder 2019 R2 was launched by Appeon. This release includes a first-ever PowerScript-to-C# code converter, which can automatically migrate 80-95% of PowerBuilder business logic and DataWindows to C#. Interoperability between PowerScript and .NET programming languages is also now supported. Many existing features have also been enhanced. On January 22, 2021, PowerBuilder 2019 R3 was launched by Appeon. This release provides a groundbreaking new app deployment technology called PowerClient, which securely automates the installation and update of client apps over HTTPS. C# Web API development has been greatly enhanced with asynchronous programming and support for Amazon Aurora and Azure cloud databases. Aside from many other new features, PowerBuilder 2019 R3 is a long-term support (LTS) version that replaces previous LTS versions On August 6, 2021, PowerBuilder 2021 was launched by Appeon. The Cloud deployment capability of the PowerBuilder 2021 IDE, in conjunction with the matching PowerServer 2021 runtime, was revamped, bringing PowerBuilder up-to-date with the latest .NET technologies. The presentation layer now executes PowerScript natively on Windows devices. The middle-tier has been rebuilt around REST API standard with a pure .NET Core implementation. A new CI/CD utility that integrates with Git/SVN and Jenkins, witch compiles all PowerBuilder projects using the command-line interface, was added alongside other features. On September 4, 2022, PowerBuilder 2022 was launched by Appeon. This release brings enhancements to the productivity of developing both client/server & installable cloud apps and more security measures to safeguard your apps. It includes many new features, including Windows 11 support, introducing time-saving functionalities to the IDE, such as Tabbed Code Editor, Jump to Objects, and Quick Code Search, and supports the latest HTTP/2 and TLS 1.3 protocols and two-way TLS authentication. On August 4, 2023, PowerBuilder 2022 R2 was launched by Appeon. This release introduces a range of new features aimed at helping developers build powerful, feature-rich, and secure client/server and installable cloud apps more efficiently, including tabbed windows, fillable PDFs, and SMTP client. On January 8, 2024, PowerBuilder 2022 R3 was launched by Appeon. This release is a long-term support version. Features previously released in earlier releases have been enhanced and/or corrected. On May 7, 2025, PowerBuilder 2025 was launched by Appeon. This release delivers a revamped IDE that boosts developer productivity throughout the SLDC—from writing and extending code to debugging, automating builds, and deploying applications. It features a new-generation code editor, ultra-fast compiler, automatic REST API creation, faster GIT operations, and codeless UI modernization features. == Features == PowerBuilder is an object-oriented programming language. Nearly all of the visual and non-visual objects support inheritance, polymorphism, and encapsulation. The programmer may utilize a common code framework such as PowerBuilder Foundation Classes, also known as PFC, to inherit objects from and leverage pre-existing code. The DataWindow is the key component (and selling point) of PowerBuilder. The DataWindow offers a visual SQL painter which supports outer joins, unions and subquery operations. It can convert SQL to visual representation and back, so the developer can use native SQL if desired. DataWindow updates are automatic — it produces the proper SQL at runtime based on the DBMS to which the user is currently connected. This feature makes it easier for developers who are not experienced with SQL. The DataWindow also has the built-in ability to both retrieve data and update data via stored procedures or REST Web APIs as well as import/export JSON data. The RESTClient object introduced in PowerBuilder 2017 facilitates bridging the DataWindow with REST Web APIs and requiring minimal coding. === RDBMS interfaces === PowerBuilder offers native interfaces to all major databases, as well as ODBC and OLE-DB, in the Enterprise version. There are many connectivity options that allow performance monitoring and tuning, such as: Integrated security Tracing of all SQL Isolation level Password expiration dialog Blocking factor Number of SQL statements to cache Use connection pool Thread safety Trace ODBC API calls Due to the information about the database schema (such as primary key information) that are stored in PowerBuilder's data dictionary, the code required to implement data display and browsing is greatly simplified, because the dictionary information allows generation of the appropriate SQL behind the scenes. PowerBuilder supports the following ways of interacting with a database: DataWindow this is the simplest approach, relying on automatically generated SQL. Embedded SQL Embedded SQL supports SELECT, INSERT, UPDATE, DELETE and cursors. This option is used when the developer desires more control than is available with the
Wadhwani Institute for Artificial Intelligence
Wadhwani AI, based in Mumbai, Maharashtra, is an independent, non-profit institute. Founded in 2018, it is dedicated to developing Artificial intelligence solutions for social good. Their mission is to build AI-based innovations and solutions for underserved communities in developing countries, for a wide range of domains including agriculture, education, financial inclusion, healthcare, and infrastructure. == History and funding == The institute was founded with a $30 million philanthropic effort by the Wadhwani brothers, Romesh Wadhwani and Sunil Wadhwani. The institute was inaugurated and dedicated to the nation by Narendra Modi, the 14th Prime Minister of India. In 2019, the institute received a $2 million grant from Google.org to create technologies to help reduce crop losses in cotton farming, through integrated pest management. The United States Agency for International Development awarded $2 million to the institute in 2020 to develop tools, using mathematical modeling techniques and digital technologies such as artificial intelligence and machine learning, to forecast COVID-19 disease patterns, estimate resources needed, and plan interventions. == Collaboration == With assistance from Google, the Ministry of Agriculture and Farmers' Welfare and the Wadhwani AI developed Krishi 24/7, the first AI-powered automated agricultural news monitoring and analysis tool. Through better decision-making, Krishi 24/7 will support the identification of valuable news, provide timely notifications, and respond quickly to safeguard farmers' interests and advance sustainable agricultural growth. The application converts news articles into English after scanning them in several languages. It ensures that the ministry is informed in a timely manner about pertinent occurrences that are published online by extracting key information from news items, including the headline, crop name, event type, date, location, severity, summary, and source link. The National Center for Disease Control has effectively implemented a comparable automated surveillance and analysis tool for disease outbreaks.
Perceptron
In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers. A binary classifier is a function that can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector. == History == The artificial neuron and artificial neural network were invented in 1943 by Warren McCulloch and Walter Pitts in their seminal paper "A Logical Calculus of the Ideas Immanent in Nervous Activity". In 1957, Frank Rosenblatt was at the Cornell Aeronautical Laboratory. He simulated the perceptron on an IBM 704. Later, he obtained funding by the Information Systems Branch of the United States Office of Naval Research and the Rome Air Development Center, to build a custom-made computer, the Mark I Perceptron. It was first publicly demonstrated on 23 June 1960. The machine was "part of a previously secret four-year NPIC [the US' National Photographic Interpretation Center] effort from 1963 through 1966 to develop this algorithm into a useful tool for photo-interpreters". Rosenblatt described the details of the perceptron in a 1958 paper. His organization of a perceptron is constructed of three kinds of cells ("units"): S, A, R, which stand for "sensory", "association" and "response". He presented at the first international symposium on AI, Mechanisation of Thought Processes, which took place in 1958 November. Rosenblatt's project was funded under Contract Nonr-401(40) "Cognitive Systems Research Program", which lasted from 1959 to 1970, and Contract Nonr-2381(00) "Project PARA" ("PARA" means "Perceiving and Recognition Automata"), which lasted from 1957 to 1963. In 1959, the Institute for Defense Analysis awarded his group a $10,000 contract. By September 1961, the ONR awarded further $153,000 worth of contracts, with $108,000 committed for 1962. The ONR research manager, Marvin Denicoff, stated that ONR, instead of ARPA, funded the Perceptron project, because the project was unlikely to produce technological results in the near or medium term. Funding from ARPA go up to the order of millions dollars, while from ONR are on the order of 10,000 dollars. Meanwhile, the head of IPTO at ARPA, J.C.R. Licklider, was interested in 'self-organizing', 'adaptive' and other biologically-inspired methods in the 1950s; but by the mid-1960s he was openly critical of these, including the perceptron. Instead he strongly favored the logical AI approach of Simon and Newell. === Mark I Perceptron machine === The perceptron was intended to be a machine, rather than a program, and while its first implementation was in software for the IBM 704, it was subsequently implemented in custom-built hardware as the Mark I Perceptron with the project name "Project PARA", designed for image recognition. The machine is currently in Smithsonian National Museum of American History. The Mark I Perceptron had three layers. One version was implemented as follows: An array of 400 photocells arranged in a 20x20 grid, named "sensory units" (S-units), or "input retina". Each S-unit can connect to up to 40 A-units. A hidden layer of 512 perceptrons, named "association units" (A-units). An output layer of eight perceptrons, named "response units" (R-units). Rosenblatt called this three-layered perceptron network the alpha-perceptron, to distinguish it from other perceptron models he experimented with. The S-units are connected to the A-units randomly (according to a table of random numbers) via a plugboard (see photo), to "eliminate any particular intentional bias in the perceptron". The connection weights are fixed, not learned. Rosenblatt was adamant about the random connections, as he believed the retina was randomly connected to the visual cortex, and he wanted his perceptron machine to resemble human visual perception. The A-units are connected to the R-units, with adjustable weights encoded in potentiometers, and weight updates during learning were performed by electric motors.The hardware details are in an operators' manual. In a 1958 press conference organized by the US Navy, Rosenblatt made statements about the perceptron that caused a heated controversy among the fledgling AI community; based on Rosenblatt's statements, The New York Times reported the perceptron to be "the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence." The Photo Division of Central Intelligence Agency, from 1960 to 1964, studied the use of Mark I Perceptron machine for recognizing militarily interesting silhouetted targets (such as planes and ships) in aerial photos. === Principles of Neurodynamics (1962) === Rosenblatt described his experiments with many variants of the Perceptron machine in a book Principles of Neurodynamics (1962). The book is a published version of the 1961 report. Among the variants are: "cross-coupling" (connections between units within the same layer) with possibly closed loops, "back-coupling" (connections from units in a later layer to units in a previous layer), four-layer perceptrons where the last two layers have adjustable weights (and thus a proper multilayer perceptron), incorporating time-delays to perceptron units, to allow for processing sequential data, analyzing audio (instead of images). The machine was shipped from Cornell to Smithsonian in 1967, under a government transfer administered by the Office of Naval Research. === Perceptrons (1969) === Although the perceptron initially seemed promising, it was quickly proved that perceptrons could not be trained to recognise many classes of patterns. This caused the field of neural network research to stagnate for many years, before it was recognised that a feedforward neural network with two or more layers (also called a multilayer perceptron) had greater processing power than perceptrons with one layer (also called a single-layer perceptron). Single-layer perceptrons are only capable of learning linearly separable patterns. For a classification task with some step activation function, a single node will have a single line dividing the data points forming the patterns. More nodes can create more dividing lines, but those lines must somehow be combined to form more complex classifications. A second layer of perceptrons, or even linear nodes, are sufficient to solve many otherwise non-separable problems. In 1969, a famous book entitled Perceptrons by Marvin Minsky and Seymour Papert showed that it was impossible for these classes of network to learn an XOR function. It is often incorrectly believed that they also conjectured that a similar result would hold for a multi-layer perceptron network. However, this is not true, as both Minsky and Papert already knew that multi-layer perceptrons were capable of producing an XOR function. (See the page on Perceptrons (book) for more information.) Nevertheless, the often-miscited Minsky and Papert text caused a significant decline in interest and funding of neural network research. It took ten more years until neural network research experienced a resurgence in the 1980s. This text was reprinted in 1987 as "Perceptrons - Expanded Edition" where some errors in the original text are shown and corrected. === Subsequent work === Rosenblatt continued working on perceptrons despite diminishing funding. The last attempt was Tobermory, built between 1961 and 1967, built for speech recognition. It occupied an entire room. It had 4 layers with 12,000 weights implemented by toroidal magnetic cores. By the time of its completion, simulation on digital computers had become faster than purpose-built perceptron machines. He died in a boating accident in 1971. A simulation program for neural networks was written for IBM 7090/7094, and was used to study various pattern recognition applications, such as character recognition, particle tracks in bubble-chamber photographs; phoneme, isolated word, and continuous speech recognition; speaker verification; and center-of-attention mechanisms for image processing. The kernel perceptron algorithm was already introduced in 1964 by Aizerman et al. Margin bounds guarantees were given for the Perceptron algorithm in the general non-separable case first by Freund and Schapire (1998), and more recently by Mohri and Rostamizadeh (2013) who extend previous results and give new and more favorable L1 bounds. The perceptron is a simplified model of a biological neuron. While the complexity of biological neuron models is often required to fully understand neural behavior, research suggests a perceptron-like linear model can produce some behavior seen in real neurons. The solution spaces of decision boundaries for all binary functions and learning behaviors are studied in. == Definition == In the modern sense, the perceptron is an algori
Self-organizing map
A self-organizing map (SOM) or self-organizing feature map (SOFM) is an unsupervised machine learning technique used to produce a low-dimensional (typically two-dimensional) representation of a higher-dimensional data set while preserving the topological structure of the data. For example, a data set with p {\displaystyle p} variables measured in n {\displaystyle n} observations could be represented as clusters of observations with similar values for the variables. These clusters then could be visualized as a two-dimensional "map" such that observations in proximal clusters have more similar values than observations in distal clusters. This can make high-dimensional data easier to visualize and analyze. A SOM is a type of artificial neural network but is trained using competitive learning rather than the error-correction learning (e.g., backpropagation with gradient descent) used by other artificial neural networks. The SOM was introduced by the Finnish professor Teuvo Kohonen in the 1980s and therefore is sometimes called a Kohonen map or Kohonen network. The Kohonen map or network is a computationally convenient abstraction building on biological models of neural systems from the 1970s and morphogenesis models dating back to Alan Turing in the 1950s. SOMs create internal representations reminiscent of the cortical homunculus, a distorted representation of the human body, based on a neurological "map" of the areas and proportions of the human brain dedicated to processing sensory functions, for different parts of the body. == Overview == Self-organizing maps, like most artificial neural networks, operate in two modes: training and mapping. First, training uses an input data set (the "input space") to generate a lower-dimensional representation of the input data (the "map space"). Second, mapping classifies additional input data using the generated map. The goal of training is to represent an input space with p dimensions as a map space with n dimensions, where p > n. Specifically, an input space with p variables is said to have p dimensions. A map space consists of components called "nodes" or "neurons", which are arranged as a hexagonal or rectangular grid with two dimensions. The number of nodes and their arrangement are specified beforehand based on the larger goals of the analysis and exploration of the data. Each node in the map space is associated with a "weight" vector, which is the position of the node in the input space. While nodes in the map space stay fixed, training consists in moving weight vectors toward the input data (reducing a distance metric such as Euclidean distance) without spoiling the topology induced from the map space. After training, the map can be used to classify additional observations for the input space by finding the node with the closest weight vector (smallest distance metric) to the input space vector. == Learning algorithm == The goal of learning in the self-organizing map is to cause different parts of the network to respond similarly to certain input patterns. This is partly motivated by how visual, auditory or other sensory information is handled in separate parts of the cerebral cortex in the human brain. The weights of the neurons are initialized either to small random values or sampled evenly from the subspace spanned by the two largest principal component eigenvectors. With the latter alternative, learning is much faster because the initial weights already give a good approximation of SOM weights. The network must be fed a large number of example vectors that represent, as close as possible, the kinds of vectors expected during mapping. The examples are usually administered several times as iterations. The training utilizes competitive learning. When a training example is fed to the network, its Euclidean distance to all weight vectors is computed. The neuron whose weight vector is most similar to the input is called the best matching unit (BMU). The weights of the BMU and neurons close to it in the SOM grid are adjusted towards the input vector. The magnitude of the change decreases with time and with the grid-distance from the BMU. The update formula for a neuron v with weight vector Wv(s) is W v ( s + 1 ) = W v ( s ) + θ ( u , v , s ) ⋅ α ( s ) ⋅ ( D ( t ) − W v ( s ) ) {\displaystyle W_{v}(s+1)=W_{v}(s)+\theta (u,v,s)\cdot \alpha (s)\cdot (D(t)-W_{v}(s))} , where s is the step index, t is an index into the training sample, u is the index of the BMU for the input vector D(t), α(s) is a monotonically decreasing learning coefficient; θ(u, v, s) is the neighborhood function which gives the distance between the neuron u and the neuron v in step s. Depending on the implementations, t can scan the training data set systematically (t is 0, 1, 2...T-1, then repeat, T being the training sample's size), be randomly drawn from the data set (bootstrap sampling), or implement some other sampling method (such as jackknifing). The neighborhood function θ(u, v, s) (also called function of lateral interaction) depends on the grid-distance between the BMU (neuron u) and neuron v. In the simplest form, it is 1 for all neurons close enough to BMU and 0 for others, but the Gaussian and Mexican-hat functions are common choices, too. Regardless of the functional form, the neighborhood function shrinks with time. At the beginning when the neighborhood is broad, the self-organizing takes place on the global scale. When the neighborhood has shrunk to just a couple of neurons, the weights are converging to local estimates. In some implementations, the learning coefficient α and the neighborhood function θ decrease steadily with increasing s, in others (in particular those where t scans the training data set) they decrease in step-wise fashion, once every T steps. This process is repeated for each input vector for a (usually large) number of cycles λ. The network winds up associating output nodes with groups or patterns in the input data set. If these patterns can be named, the names can be attached to the associated nodes in the trained net. During mapping, there will be one single winning neuron: the neuron whose weight vector lies closest to the input vector. This can be simply determined by calculating the Euclidean distance between input vector and weight vector. While representing input data as vectors has been emphasized in this article, any kind of object which can be represented digitally, which has an appropriate distance measure associated with it, and in which the necessary operations for training are possible can be used to construct a self-organizing map. This includes matrices, continuous functions or even other self-organizing maps. === Algorithm === Randomize the node weight vectors in a map For s = 0 , 1 , 2 , . . . , λ {\displaystyle s=0,1,2,...,\lambda } Randomly pick an input vector D ( t ) {\displaystyle {D}(t)} Find the node in the map closest to the input vector. This node is the best matching unit (BMU). Denote it by u {\displaystyle u} For each node v {\displaystyle v} , update its vector by pulling it closer to the input vector: W v ( s + 1 ) = W v ( s ) + θ ( u , v , s ) ⋅ α ( s ) ⋅ ( D ( t ) − W v ( s ) ) {\displaystyle W_{v}(s+1)=W_{v}(s)+\theta (u,v,s)\cdot \alpha (s)\cdot (D(t)-W_{v}(s))} The variable names mean the following, with vectors in bold, s {\displaystyle s} is the current iteration λ {\displaystyle \lambda } is the iteration limit t {\displaystyle t} is the index of the target input data vector in the input data set D {\displaystyle \mathbf {D} } D ( t ) {\displaystyle {D}(t)} is a target input data vector v {\displaystyle v} is the index of the node in the map W v {\displaystyle \mathbf {W} _{v}} is the current weight vector of node v {\displaystyle v} u {\displaystyle u} is the index of the best matching unit (BMU) in the map θ ( u , v , s ) {\displaystyle \theta (u,v,s)} is the neighbourhood function, α ( s ) {\displaystyle \alpha (s)} is the learning rate schedule. The key design choices are the shape of the SOM, the neighbourhood function, and the learning rate schedule. The idea of the neighborhood function is to make it such that the BMU is updated the most, its immediate neighbors are updated a little less, and so on. The idea of the learning rate schedule is to make it so that the map updates are large at the start, and gradually stop updating. For example, if we want to learn a SOM using a square grid, we can index it using ( i , j ) {\displaystyle (i,j)} where both i , j ∈ 1 : N {\displaystyle i,j\in 1:N} . The neighborhood function can make it so that the BMU updates in full, the nearest neighbors update in half, and their neighbors update in half again, etc. θ ( ( i , j ) , ( i ′ , j ′ ) , s ) = 1 2 | i − i ′ | + | j − j ′ | = { 1 if i = i ′ , j = j ′ 1 / 2 if | i − i ′ | + | j − j ′ | = 1 1 / 4 if | i − i ′ | + | j − j ′ | = 2 ⋯ ⋯ {\displaystyle \theta ((i,j),(i',j'),s)={\frac {1}{2^{|i-i'|+|j-j'|}}}={\begin{cases}1&{\text{if }}i=i',j=j'\\1/2&{\text{if
Persian Speech Corpus
The Persian Speech Corpus is a Modern Persian speech corpus for speech synthesis. The corpus contains phonetic and orthographic transcriptions of about 2.5 hours of Persian speech aligned with recorded speech on the phoneme level, including annotations of word boundaries. Previous spoken corpora of Persian include FARSDAT, which consists of read aloud speech from newspaper texts from 100 Persian speakers and the Telephone FARsi Spoken language DATabase (TFARSDAT) which comprises seven hours of read and spontaneous speech produced by 60 native speakers of Persian from ten regions of Iran. The Persian Speech Corpus was built using the same methodologies laid out in the doctoral project on Modern Standard Arabic of Nawar Halabi at the University of Southampton. The work was funded by MicroLinkPC, who own an exclusive license to commercialise the corpus, though the corpus is available for non-commercial use through the corpus' website. It is distributed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The corpus was built for speech synthesis purposes, but has been used for building HMM based voices in Persian. It can also be used to automatically align other speech corpora with their phonetic transcript and could be used as part of a larger corpus for training speech recognition systems. == Contents == The corpus is downloadable from its website, and contains the following: 396 .wav files containing spoken utterances 396 .lab files containing text utterances 396 .TextGrid files containing the phoneme labels with time stamps of the boundaries where these occur in the .wav files. phonetic-transcript.txt which has the form "[wav_filename]" "[Phoneme Sequence]" in every line orthographic-transcript.txt which has the form "[wav_filename]" "[Orthographic Transcript]" in every line
Masking (art)
In art, craft, and engineering, masking is the use of materials to protect areas from change, or to focus change on other areas. This can describe either the techniques and materials used to control the development of a work of art by protecting a desired area from change; or a phenomenon that (either intentionally or unintentionally) causes a sensation to be concealed from conscious attention. The term is derived from the word mask, in the sense that it hides the face from view. == In painting == Masking materials supplement a painter's dexterity and choice of applicator to control where paint is laid. Examples include the use of a stencil or masking tape to protect areas which are not to be painted. === Solid masks === Most solid masks require an adhesive to hold the mask in place while work is performed. Some, such as masking tape and frisket, come with adhesive pre-applied. Solid masks are readily available in bulk, and are used in large painting jobs. Paper products Kraft paper Butcher paper Masking tape Plastic film Frisket Polyester tape Stencils Silk screen === Liquid masks === Liquid masks are preferred where precision is needed; they prevent paint from seeping underneath, resulting in clean edges. Care must be taken to remove them without damaging the work underneath. Latex or other polymers Molten wax Gesso, typically a substrate for painting, but can also be applied to achieve masking effects == In photography == Masks used for photography are used to enhance the quality of an image. Representations of a scene—whether film, video display, or printed—do not have the dynamic contrast range available to the human eye looking directly at the same scene. Adjusting the contrast in an image helps restore some of the perceived qualities of the original scene. These adjustments are typically performed on "blown-out" highlights, and "crushed" or "muddy" shadow areas, where clipping has occurred; or on desaturated colors. Photographic masks are peculiar in that they are produced from the image they will alter, an exercise in recursion. Masks used to produce other effects are similar to those used in painting. === Controlling exposure === ==== Film ==== The basic methods of controlling exposure are dodging and burning, which respectively lighten (reduce exposure) and darken (increase exposure) areas of an image. The tools a film photographer uses range from shaped pieces of black material (such as studio foil, foam, and paper) to the photographer's hands. To create a photographic mask, a sheet of negative film is contact-exposed to the original film negative or slide positive in a particular way. Both films are then combined to produce a processed positive. The process is similar when applied using digital techniques: the inverse of the working image is reduced to an image mask; filters or other adjustments are then applied, using the mask to selectively block portions of the image. ==== Digital ==== Image editors offer at the very least a "Select All" command and a rectangular "marquee" selection tool. (The word "marquee" describes the "crawling ants" border used to highlight the active region.) Once a selection is created, further changes to the image will be confined to that area. To continue editing the rest of the image, the selection is either "deselected" or the entire image is selected. Advanced suites offer more ways to select portions of an image, as well as ways to combine these selections through. Selection masks can be switched between an editable greyscale image and a mask. They allow the user to create a mask using the suite's painting tools. === Contrast masking === When the contrast range of an image needs to be adjusted, a contrast mask is a simple solution. The processed image resembles what would be achieved when exposing through a neutral density filter, but the effects are focused highly upon the extreme regions of the image. The blocking areas of the mask coincide with the highlights of the image, and the permissive areas with the shadows, resulting in more detail appearing in each. ==== Film ==== The mask is often made from high-quality black-and-white film, such as Kodak Technical Pan, which allows for a degree of softening on the mask. Its processing time is reduced so as to not completely oppose the original negative. Both negatives are combined and registered, and collectively exposed with additional time to compensate for the presence of the mask. ==== Digital ==== Contrast masking is made simpler with digital editing. A grayscale version of the image is produced, either by desaturation or by calculating selected ratios of the image's color channels, inverted, and blurred. The mask and original image are blended together to produce the final processed image. Some image editors allow for refinement of the effect by changing the strength of the blend. Contrast masking can be considered to be the opposite of gamma correction, which adjusts the midtones of an image. Effects similar to contrast masking can be achieved by adjusting the response curves of an image. === Unsharp masking === A derivative of contrast masking is unsharp masking, an unusual term for a process intended to increase the apparent sharpness (acutance) of an image. Unsharp masking uses a blurred form of the image to increase contrast along regions of moderate contrast difference. Around edges, the blur region causes highlights to overexpose and shadows to underexpose. Taken to an extreme, the edges become overly visible and detract from the quality of the image—this is referred to as halation. Unsharp masking does not increase the actual sharpness, as it cannot recover details lost to blurring. ==== Film ==== Unsharp masking allows the photographer to sharpen areas that have become blurred in the original negative, due to long shutter speed/exposure time, or from using a wide aperture/"fast" lens. When creating the unsharp mask, extra space or diffusing material is added between the image and the mask to produce the necessary blur. ==== Digital ==== Unsharp masking has become automated in digital editing, with higher-end suites offering the process as a "tool" or "filter" in their standard sharpening kits—the actual creation of a mask is bypassed in favor of calculations that represent the mask's effect. The process depends on three factors: the radius of the blur, the strength of the effect, and the threshold degree of contrast above which the effect will be applied. (Adjusting the threshold allows the editor to apply the effect selectively upon moderately defined edges and ignore image noise.) Unsharp masking is computationally more complex than other sharpening algorithms, but results in a higher-quality remedy. Deconvolution allows for truer sharpening, but is much more complex than unsharp masking.
NSynth
NSynth (a portmanteau of "Neural Synthesis") is a WaveNet-based autoencoder for synthesizing audio, outlined in a paper in April 2017. == Overview == The model generates sounds through a neural network based synthesis, employing a WaveNet-style autoencoder to learn its own temporal embeddings from four different sounds. Google then released an open source hardware interface for the algorithm called NSynth Super, used by notable musicians such as Grimes and YACHT to generate experimental music using artificial intelligence. The research and development of the algorithm was part of a collaboration between Google Brain, Magenta and DeepMind. == Technology == === Dataset === The NSynth dataset is composed of 305,979 one-shot instrumental notes featuring a unique pitch, timbre, and envelope, sampled from 1,006 instruments from commercial sample libraries. For each instrument the dataset contains four-second 16 kHz audio snippets by ranging over every pitch of a standard MIDI piano, as well as five different velocities. The dataset is made available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license. === Machine learning model === A spectral autoencoder model and a WaveNet autoencoder model are publicly available on GitHub. The baseline model uses a spectrogram with fft_size 1024 and hop_size 256, MSE loss on the magnitudes, and the Griffin-Lim algorithm for reconstruction. The WaveNet model trains on mu-law encoded waveform chunks of size 6144. It learns embeddings with 16 dimensions that are downsampled by 512 in time. == NSynth Super == In 2018 Google released a hardware interface for the NSynth algorithm, called NSynth Super, designed to provide an accessible physical interface to the algorithm for musicians to use in their artistic production. Design files, source code and internal components are released under an open source Apache License 2.0, enabling hobbyists and musicians to freely build and use the instrument. At the core of the NSynth Super there is a Raspberry Pi, extended with a custom printed circuit board to accommodate the interface elements. == Influence == Despite not being publicly available as a commercial product, NSynth Super has been used by notable artists, including Grimes and YACHT. Grimes reported using the instrument in her 2020 studio album Miss Anthropocene. YACHT announced an extensive use of NSynth Super in their album Chain Tripping. Claire L. Evans compared the potential influence of the instrument to the Roland TR-808. The NSynth Super design was honored with a D&AD Yellow Pencil award in 2018.