AI For Business Specialization Upenn

AI For Business Specialization Upenn — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Digital art

    Digital art

    Digital art, or the digital arts, is artistic work that uses digital technology as part of the creative or presentational process. It can also refer to computational art that uses and engages with digital media. Since the 1960s, various names have been used to describe digital art, including computer art, electronic art, multimedia art, and new media art. Digital art includes pieces stored on physical media, such as with digital painting, as well as digital galleries on websites. Digital art also extends to the field of visual computing. == History == In the early 1960s, John Whitney developed the first computer-generated art using mathematical operations. In 1963, Ivan Sutherland invented the first user interactive computer-graphics interface known as Sketchpad. Between 1974 and 1977, Salvador Dalí created two big canvases of Gala Contemplating the Mediterranean Sea which at a distance of 20 meters is transformed into the portrait of Abraham Lincoln (Homage to Rothko) and prints of Lincoln in Dalivision based on a portrait of Abraham Lincoln processed on a computer by Leon Harmon published in "The Recognition of Faces". The technique is similar to what later became known as photographic mosaics. Andy Warhol created digital art using an Amiga where the computer was publicly introduced at the Lincoln Center in July 1985. An image of Debbie Harry was captured in monochrome from a video camera and digitized into a graphics program called ProPaint. Warhol manipulated the image by adding color using flood fills. == Art made for digital media == Artwork that is highly computational, presented through digital media, and explicitly engages with digital technologies are categorized as "art made for digital media". This differs from art using digital tools, which incorporate digital technology in the creation process but may exist outside the digital world. Digital art historian Christiane Paul writes that it "is highly problematic to classify all art that makes use of digital technologies somewhere in its production and dissemination process as digital art since it makes it almost impossible to arrive at any unifying statement about the art form". == Art that uses digital tools == Digital art can be purely computer-generated (such as fractals and algorithmic art) or taken from other sources, such as a scanned photograph or an image drawn using vector graphics software using a mouse or graphics tablet. Artworks are considered digital paintings when created similarly to non-digital paintings but using software on a computer platform and digitally outputting the resulting image as painted on canvas. Despite differing viewpoints on digital technology's impact on the arts, a consensus exists within the digital art community about its significant contribution to expanding the creative domain, i.e., that it has greatly broadened the creative opportunities available to professional and non-professional artists alike. == Art theorists and art historians == Notable art theorists and historians in this field include: Oliver Grau, Jon Ippolito, Christiane Paul, Frank Popper, Jasia Reichardt, Mario Costa, Christine Buci-Glucksmann, Dominique Moulon, Roy Ascott, Catherine Perret, Margot Lovejoy, Edmond Couchot, Tina Rivers Ryan, Fred Forest and Edward A. Shanken. === Digital painting === Digital painting is either a physical painting made with the use of digital electronics and spray paint robotics within the digital art fine art context or pictorial art imagery made with pixels on a computer screen that mimics artworks from the traditional histories of painting and illustration. === Artificial intelligence art === Artists have used artificial intelligence to create artwork since at least the 1960s. Since their design in 2014, some artists have created artwork using a generative adversarial network (GAN), which is a machine learning framework that allows two "algorithms" to compete with each other and iterate. It can be used to generate pictures that have visual effects similar to traditional fine art. The essential idea of image generators is that people can use text descriptions to let AI convert their text into visual picture content. Anyone can turn their language into a painting through a picture generator. == Digital art education == Digital art education has become more common with the advancement of digital hardware and software. From hardware such as graphics tablets, styluses, tablets, 3D scanners, virtual reality headsets, and digital cameras; to software such as digital art software, 3D modeling software, 3D rendering, digital sculpting, 2D graphics software, digital painting, 3D terrain generation, 2D animation software, 3D animation software, raster graphics editors, vector graphics editors, mathematical art software, and video editing software. == Scholarship and archives == In addition to the creation of original art, research methods that utilize AI have been generated to quantitatively analyze digital art collections. This has been made possible due to the large-scale digitization of artwork in the past few decades. Although the main goal of digitization was to allow for accessibility and exploration of these collections, the use of AI in analyzing them has brought about new research perspectives. Two computational methods, close reading and distant viewing, are the typical approaches used to analyze digitized art. Close reading focuses on specific visual aspects of one piece. Some tasks performed by machines in close reading methods include computational artist authentication and analysis of brushstrokes or texture properties. In contrast, through distant viewing methods, the similarity across an entire collection for a specific feature can be statistically visualized. Common tasks relating to this method include automatic classification, object detection, multimodal tasks, knowledge discovery in art history, and computational aesthetics. Whereas distant viewing includes the analysis of large collections, close reading involves one piece of artwork. Whilst 2D and 3D digital art is beneficial as it allows the preservation of history that would otherwise have been destroyed by events like natural disasters and war, there is the issue of who should own these 3D scans – i.e., who should own the digital copyrights. === Computer demos === Computer demos are based on computer programs, usually non-interactive. It produces audiovisual presentations. They are a novel form of art, which emerged as a consequence of the home computer revolution in the early 1980s. In the classification of digital art, they can be best described as real-time procedurally generated animated audio-visuals. This form of art does not concentrate only on the aesthetics of the final presentation, but also on the complexities and skills involved in creating the presentation. As such, it can be fully enjoyed only by persons with a relatively high knowledge level of relevant computer technologies. An example is that, as said by Hua Jin and Jie Yang, Using computer-aided design software to present the class content in art design teaching," is not to advocate computer-aided design instead of hand-drawn performance, but to make it serve the profession earlier through a more reasonable course arrangement." On the other hand, many of the created pieces of art are primarily aesthetic or amusing, and those can be enjoyed by the general public. === Digital installation art === Digital installation art constitutes a broad field of artistic practices and a variety of forms. Some resemble video installations, especially large-scale works involving projections and live video capture. By using projection techniques that enhance an audience's impression of sensory envelopment, many digital installations attempt to create immersive environments. While others go even further and attempt to facilitate a complete immersion in virtual realms. This type of installation is generally site-specific, scalable, and without fixed dimensionality, meaning it can be reconfigured to accommodate different presentation spaces. Scott Snibbe's "Boundary Functions" is an example of augmented reality digital installation art, which responds to people who enter the installation by drawing lines between people, indicating their personal space.Noah Wardrip-Fruin's "Screen"(2003) utilizes a Cave Automatic Virtual Environment (CAVE) to create an interactive, text-based digital experience that engages the viewer in a multi-sensory interaction. === Internet art and net.art === Internet art is digital art that uses the specific characteristics of the Internet and is exhibited on the Internet. The term "internet art" is included by "net art" for which artists assume that network will be refreshed through history. So the term "post-internet art" is used to exclude artworks outside of the internet media. A representative example is Protocols for Achievements, which is a digital photo frame that confronts the aestheti

    Read more →
  • Text normalization

    Text normalization

    Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure. == Applications == Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context. For example: "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan. "vi" could be pronounced as "vie," "vee," or "the sixth" depending on the surrounding words. Text can also be normalized for storing and searching in a database. For instance, if a search for "resume" is to match the word "résumé," then the text would be normalized by removing diacritical marks; and if "john" is to match "John", the text would be converted to a single case. To prepare text for searching, it might also be stemmed (e.g. converting "flew" and "flying" both into "fly"), canonicalized (e.g. consistently using American or British English spelling), or have stop words removed. == Techniques == For simple, context-independent normalization, such as removing non-alphanumeric characters or diacritical marks, regular expressions would suffice. For example, the sed script sed ‑e "s/\s+/ /g" inputfile would normalize runs of whitespace characters into a single space. More complex normalization requires correspondingly complicated algorithms, including domain knowledge of the language and vocabulary being normalized. Among other approaches, text normalization has been modeled as a problem of tokenizing and tagging streams of text and as a special case of machine translation. == Textual scholarship == In the field of textual scholarship and the editing of historic texts, the term "normalization" implies a degree of modernization and standardization – for example in the extension of scribal abbreviations and the transliteration of the archaic glyphs typically found in manuscript and early printed sources. A normalized edition is therefore distinguished from a diplomatic edition (or semi-diplomatic edition), in which some attempt is made to preserve these features. The aim is to strike an appropriate balance between, on the one hand, rigorous fidelity to the source text (including, for example, the preservation of enigmatic and ambiguous elements); and, on the other, producing a new text that will be comprehensible and accessible to the modern reader. The extent of normalization is therefore at the discretion of the editor, and will vary. Some editors, for example, choose to modernize archaic spellings and punctuation, but others do not. An edition of a text might be normalized based on internal criteria, where orthography is standardized according to the language of the original, or external criteria, where the norms of a different time period are applied. For an example of the latter, a published edition of a medieval Icelandic manuscript might be normalized to the conventions of modern Icelandic, or it might be normalized to Classical Old Icelandic. Standards of normalization vary based on language of the edition as well as the specific conventions of the publisher.

    Read more →
  • Photo-consistency

    Photo-consistency

    In computer vision, photo-consistency determines whether a given voxel is occupied. A voxel is considered to be photo consistent when its color appears to be similar to all the cameras that can see it. Most voxel coloring or space carving techniques require using photo consistency as a check condition in Image-based modeling and rendering applications. == Usage == 3D Volumetric Reconstruction. Image registration. Multi-view reconstruction.

    Read more →
  • Amazon Q

    Amazon Q

    Amazon Q is a chatbot developed by Amazon for enterprise use. Based on both Amazon Titan and GPT-5, it was announced on November 28, 2023. At launch, it was a part of the Amazon Web Services management console. Amazon CodeWhisperer is a part of Amazon Q Developer, a part of Amazon Q. == History == Amazon's business-focused chatbot Q was announced on November 28, 2023 in a preview, with a full version available at $20 per person per month. On July 19, 2025, the Amazon Q Visual Studio Code extension was compromised to delete the user's home directory. The issue was fixed on July 21. == Capabilities == Q can be prompted to summarize long documents and group chats, create charts, data analysis and write code. Q is also capable of accessing non-Amazon services. The chatbot is based on Amazon Titan and GPT-5, and uses the Amazon Bedrock repository of foundational models. It is part of the Amazon Web Services management console.

    Read more →
  • AppBlock

    AppBlock

    AppBlock is a software tool for managing screen time that limits access to selected mobile applications and websites. Developed by the Czech studio MobileSoft, it is distributed for Android and iOS devices as well as through browser extensions for Google Chrome, Microsoft Edge and Brave, and as desktop solutions. The application is used primarily to restrict time spent on social media and similar distracting services while working and studying. By 2025, the application reported 700,000 monthly active users, with the domestic Czech market accounting for less than one percent of its total user base and revenue. == History == === Origins === AppBlock was created by the Czech software studio MobileSoft, based in Hradec Králové. The studio was founded in 2012 by Miroslav Novosvětský, who remains the sole owner. The idea for the application arose from the use of browser-based website blockers on desktop computers. AppBlock was conceived as a way to reduce the time spent on mobile devices. === Early releases === In its early phase, AppBlock was available only for phones running on Android. Early versions allowed users to limit access to selected applications and websites during specified periods. From the outset, the application was distributed internationally rather than only within the Czech market, and early coverage reported a multi-million number of downloads worldwide. === Expansion of functionality === Over time, AppBlock has expanded beyond basic application blocking to include additional functions related to limiting procrastination and managing attention. The development of AppBlock accelerated during the COVID-19 pandemic. Following a reduction in external client orders, the studio reallocated resources from contract development to the application. Increased digital content consumption during lockdowns contributed to a rise in the application's usage and revenue. As the application developed, it became the company's product with the largest user base. Novosvětský described an increase in downloads over a twelve-month period, which he linked in part to the company's activities abroad, including participation in events focused on mobile marketing in the United States. These activities were an important factor in the further development of AppBlock. === Internationalization and market expansion === Within roughly the first eight years of the company's existence, MobileSoft became active both in the domestic Czech market and in the United States, supported among other things by participation in the CzechAccelerator program, which is intended to help Czech firms enter foreign markets. In mid-August 2021 the developers launched a version for iOS, which soon began to attract paying users. The expansion to iOS was accompanied by plans for cooperation with the Procrastination.com platform, intended to complement the blocking functions with educational content related to digital media use, sleep and work habits. By 2025, AppBlock was localised into 15 languages, with the largest share of users in the United States, the United Kingdom, Germany, and France, with recent growth in Brazil, and usage extending across several continents. AppBlock has reached more than 10 million installations. In the same period its creators announced plans to refine existing functions and to expand support beyond mobile phones to desktop use, including through support for additional web browsers. == Features == === Supported platforms === AppBlock is distributed as a mobile application for Android and iOS users through Google Play and the Apple App Store. Browser extensions for desktop systems are available for Google Chrome, Microsoft Edge and Brave. === Functionality === AppBlock's core function is to restrict access to selected applications and websites. The mobile application shows a list of installed apps and lets the user select which ones to block. It also includes tools to block specific websites and, on iOS, to block certain phrases entered in the Safari browser. AppBlock can mute notifications from selected applications, so alerts from those apps do not appear while blocking is active. In addition to choosing which apps or content to block, the software also offers an allowlist mode, where only selected applications remain accessible and all others are blocked. Blocking rules are organized into configurable schedules, called profiles. Users can create profiles that define time periods when selected apps and websites are unavailable. Newer versions also allow profiles to be activated automatically based on the time of day, days of the week, the device's location, or connection to specific Wi-Fi networks. The iOS version lets users set limits on how often or how long certain apps can be used before they are blocked, and it can track and restrict screen time for individual apps. In addition to these recurring rules, AppBlock includes a Quick Block feature that temporarily blocks selected apps and websites with a single action, without requiring a separate long-term schedule. Strict Mode is an optional setting that limits the ability to change blocking once it is active. For a specified period, it prevents editing AppBlock's rules and can be configured to stop the app from being uninstalled during that time. While Strict Mode is enabled, users cannot modify or disable the restrictions they have set. Deactivation requires specific verification steps, such as connecting the device to a charger or obtaining approval from a designated contact person. The mobile application also includes statistical and reporting features. In addition to blocking, AppBlock lets users view statistics and data about their use of applications and websites, including screen-time summaries and focus sessions that silence notifications and enforce blocking during defined work or study periods. Browser extensions for desktop environments apply AppBlock's website-blocking functions on Windows and macOS systems through supported web browsers. == Business model == AppBlock uses a freemium revenue model. The basic version of the application is available free of charge and allows blocking of up to three applications at the same time. The premium version removes this limit and adds further configuration options. In 2020, the application shifted from a one-time payment structure to a subscription model. By 2021, AppBlock had more than seven thousand paying users and annual revenue of about four million Czech crowns. By 2025, annual revenue reached approximately 4 million US dollars (80 million CZK) before taxes and platform fees, with roughly 20 percent of active users subscribing to the paid version. == Usage == AppBlock limits access to selected applications and websites in order to reduce smartphone overuse and digital distraction. It is used to block social media, games and other services considered addictive, with the aim of reducing frequent checking of mobile devices and creating time intervals in which these services are unavailable. Reported use cases of AppBlock cover work, students, parents, ADHD, mental health, well-being and business. The application is used both by individual users and within workplace initiatives in which employees install it to reduce digital distractions during working hours.

    Read more →
  • Automatic acquisition of lexicon

    Automatic acquisition of lexicon

    Automatic acquisition of lexicon is a computerized process used for the development of a complex morphological lexicon of a language. The lexicon is essential for the NLP (Natural language processing), as well as a prerequisite to any wide-coverage parser. The two main requirements represent raw corpus and the morphological description of the language. The aim is to provide lemmas that will serve to the explanation of all the words that occur within the corpus. For the achievement of a quality lexicon it is necessary to manually validate the generated lemmas and iterate the whole process several times. The process is focused on the open word classes (e.g. nouns, adjectives, verbs). Closed classes (e.g. prepositions, pronouns, numerals) are excluded. This method is applicable to the languages with a rich morphology, such as Slovak, Russian or Croatian. Applied to Slovak, being an inflectional language, the automatic acquisition focuses on the inflectional morphology as well as on the derivational morphology. This fact enables the users to find out the information about derivational relations (e.g. adjectivizations, prefixes) in the lexicon. For example, Slovak word korpusový is an adjectivization of korpus (eng. corpus). == Three-step loop == Conformably to Benoît Sagot, there are three stages involved in the acquisition of lemmas: Generation and inflection Ranking Manual validation The more iteration will be performed, the more accurate lexicon will be obtained. For each iteration are essential the information given by a manual validator. === Generation and inflection === Firstly, all words which represent the closed word classes (pronouns, prepositions, numerals) are manually excluded from the given corpus. Number of their occurrences in the corpus is provided. Then the automatic generation comes, when the hypothetical lemmas according to the morphological description of a language are created. Generated lemmas are consequently being inflected, so that all of their inflected forms are built. Obtained forms are associated with the corresponding lemma and a morphological tag. === Ranking === There was created a probabilistic model, represented by a fix-point algorithm, to rank the hypothetical lemmas generated in the first step. Best ranked lemmas are expected to be ideally all correct, whereas the least ranked tend to be incorrect. === Manual validation === Correctness of the best- ranked lemmas created in the previous step are checked by the manual validator, who should be a native speaker. Lemmas are at this stage divided into three categories: valid lemmas, appended to lexicon erroneous lemmas generated by valid forms (later associated to another lemmas) erroneous lemmas generated by invalid forms (these need to be excluded) == Future development == Automatic acquisition, in comparison to a purely manual development of the lexicons, seems to be promising, considering the future development, because of the short validation time needed and the relatively small amount of human labor involved.

    Read more →
  • Healthy Together

    Healthy Together

    Healthy Together is a health technology company that provides software for Health & Humans Services Departments. Healthy Together supports a “One Door” approach to eligibility, enrollment, and management for programs like Medicaid, Supplemental Nutrition Assistance Program, TANF and WIC, as well as behavioral health (988), disease surveillance, vital records, child welfare and more. The platform's use is to increase the reach and efficacy of program initiatives, improve health equity and reduce cost. Software is available in the United States of America with current deployments in Florida, Oklahoma. The United States Department of Veterans Affairs also utilizes Healthy Together's mobile platform. == Development == Healthy Together launched in March 2020 and builds software for public health and health and human services departments. The Florida Department of Health began using the platform in September 2020 to deliver real-time test results to residents. Over 50% of households in Florida have adopted the mobile application. On December 6, 2022, the Advanced Technology Academic Research Center (ATARC) awarded Healthy Together and the State of Florida's Department of Health with a Digital Experience Award at their 2022 GITEC Emerging Technology Award Ceremony in Washington, D.C. to recognize success of the project. The partnership was also highlighted on the Federal News Network's show Federal Drive. The platform is also used at universities in Oklahoma. In November 2022, the United States Department of Veterans Affairs and Healthy Together announced a collaboration to expand access to health records for Veterans. The platform provides 18 million Veterans with access to their health information through their smartphones and mobile devices. In December 2022, the integration was recognized as one of Healthcare IT News' Top 10 stories of 2022.

    Read more →
  • Convolutional neural network

    Convolutional neural network

    A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different types of data including text, images and audio. CNNs are the de-facto standard in deep learning-based approaches to computer vision and image processing, and have only recently been replaced—in some cases—by newer architectures such as the transformer. Vanishing gradients and exploding gradients, seen during backpropagation in earlier neural networks, are prevented by the regularization that comes from using shared weights over fewer connections. For example, for each neuron in the fully-connected layer, 10,000 weights would be required for processing an image sized 100 × 100 pixels. However, applying cascaded convolution (or cross-correlation) kernels, only 25 weights for each convolutional layer are required to process 5x5-sized tiles. Higher-layer features are extracted from wider context windows, compared to lower-layer features. Some applications of CNNs include: image and video recognition, recommender systems, image classification, image segmentation, medical image analysis, natural language processing, brain–computer interfaces, and financial time series. CNNs are also known as shift invariant or space invariant artificial neural networks, based on the shared-weight architecture of the convolution kernels or filters that slide along input features and provide translation-equivariant responses known as feature maps. Counter-intuitively, most convolutional neural networks are not invariant to translation, due to the downsampling operation they apply to the input. Feedforward neural networks are usually fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer. The "full connectivity" of these networks makes them prone to overfitting data. Typical ways of regularization, or preventing overfitting, include: penalizing parameters during training (such as weight decay) or trimming connectivity (skipped connections, dropout, etc.) Robust datasets also increase the probability that CNNs will learn the generalized principles that characterize a given dataset rather than the biases of a poorly-populated set. Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field. CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns to optimize the filters (or kernels) through automated learning, whereas in traditional algorithms these filters are hand-engineered. This simplifies and automates the process, enhancing efficiency and scalability overcoming human-intervention bottlenecks. == Architecture == A convolutional neural network consists of an input layer, hidden layers and an output layer. In a convolutional neural network, the hidden layers include one or more layers that perform convolutions. Typically this includes a layer that performs a dot product of the convolution kernel with the layer's input matrix. This product is usually the Frobenius inner product, and its activation function is commonly ReLU. As the convolution kernel slides along the input matrix for the layer, the convolution operation generates a feature map, which in turn contributes to the input of the next layer. This is followed by other layers such as pooling layers, fully connected layers, and normalization layers. Here it should be noted how close a convolutional neural network is to a matched filter. === Convolutional layers === In a CNN, the input is a tensor with shape: (number of inputs) × (input height) × (input width) × (input channels) After passing through a convolutional layer, the image becomes abstracted to a feature map, also called an activation map, with shape: (number of inputs) × (feature map height) × (feature map width) × (feature map channels). Convolutional layers convolve the input and pass its result to the next layer. This is similar to the response of a neuron in the visual cortex to a specific stimulus. Each convolutional neuron processes data only for its receptive field. Although fully connected feedforward neural networks can be used to learn features and classify data, this architecture is generally impractical for larger inputs (e.g., high-resolution images), which would require massive numbers of neurons because each pixel is a relevant input feature. A fully connected layer for an image of size 100 × 100 has 10,000 weights for each neuron in the second layer. Convolution reduces the number of free parameters, allowing the network to be deeper. For example, using a 5 × 5 tiling region, each with the same shared weights, requires only 25 neurons. Using shared weights means there are many fewer parameters, which helps avoid the vanishing gradients and exploding gradients problems seen during backpropagation in earlier neural networks. To speed processing, standard convolutional layers can be replaced by depthwise separable convolutional layers, which are based on a depthwise convolution followed by a pointwise convolution. The depthwise convolution is a spatial convolution applied independently over each channel of the input tensor, while the pointwise convolution is a standard convolution restricted to the use of 1 × 1 {\displaystyle 1\times 1} kernels. === Pooling layers === Convolutional networks may include local and/or global pooling layers along with traditional convolutional layers. Pooling layers reduce the dimensions of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Local pooling combines small clusters, tiling sizes such as 2 × 2 are commonly used. Global pooling acts on all the neurons of the feature map. There are two common types of pooling in popular use: max and average. Max pooling uses the maximum value of each local cluster of neurons in the feature map, while average pooling takes the average value. === Fully connected layers === Fully connected layers connect every neuron in one layer to every neuron in another layer. It is the same as a traditional multilayer perceptron neural network (MLP). Each neuron in the fully connected layer receives input from all the neurons in the previous layer. These inputs are weighted and summed with the corresponding biases, and then passed through an activation function to perform a nonlinear transformation, generating the output. The flattened matrix goes through a fully connected layer to classify the images. === Receptive field === In neural networks, each neuron receives input from some number of locations in the previous layer. In a convolutional layer, each neuron receives input from only a restricted area of the previous layer called the neuron's receptive field. Typically the area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the receptive field is the entire previous layer. Thus, in each convolutional layer, each neuron takes input from a larger area in the input than previous layers. This is due to applying the convolution over and over, which takes the value of a pixel into account, as well as its surrounding pixels. When using dilated layers, the number of pixels in the receptive field remains constant, but the field is more sparsely populated as its dimensions grow when combining the effect of several layers. To manipulate the receptive field size as desired, there are some alternatives to the standard convolutional layer. For example, atrous or dilated convolution expands the receptive field size without increasing the number of parameters by interleaving visible and blind regions. Moreover, a single dilated convolutional layer can comprise filters with multiple dilation ratios, thus having a variable receptive field size. === Weights === Each neuron in a neural network computes an output value by applying a specific function to the input values received from the receptive field in the previous layer. The function that is applied to the input values is determined by a vector of weights and a bias (typically real numbers). Learning consists of iteratively adjusting these biases and weights. The vectors of weights and biases are called filters and represent particular features of the input (e.g., a particular shape). A distinguishing feature of CNNs is that many neurons can share the same filter. This reduces the memory footprint because a single bias and a single vector of weights are used across all receptive fields that share that filter, as opposed to each receptive field having its own bias and vector

    Read more →
  • Eigenface

    Eigenface

    An eigenface ( EYE-gən-) is the name given to a set of eigenvectors when used in the computer vision problem of human face recognition. The approach of using eigenfaces for recognition was developed by Sirovich and Kirby and used by Matthew Turk and Alex Pentland in face classification. The eigenvectors are derived from the covariance matrix of the probability distribution over the high-dimensional vector space of face images. The eigenfaces themselves form a basis set of all images used to construct the covariance matrix. This produces dimension reduction by allowing the smaller set of basis images to represent the original training images. Classification can be achieved by comparing how faces are represented by the basis set. == History == The eigenface approach began with a search for a low-dimensional representation of face images. Sirovich and Kirby showed that principal component analysis could be used on a collection of face images to form a set of basis features. These basis images, known as eigenpictures, could be linearly combined to reconstruct images in the original training set. If the training set consists of M images, principal component analysis could form a basis set of N images, where N < M. The reconstruction error is reduced by increasing the number of eigenpictures; however, the number needed is always chosen less than M. For example, if you need to generate a number of N eigenfaces for a training set of M face images, you can say that each face image can be made up of "proportions" of all the K "features" or eigenfaces: Face image1 = (23% of E1) + (2% of E2) + (51% of E3) + ... + (1% En). In 1991 M. Turk and A. Pentland expanded these results and presented the eigenface method of face recognition. In addition to designing a system for automated face recognition using eigenfaces, they showed a way of calculating the eigenvectors of a covariance matrix such that computers of the time could perform eigen-decomposition on a large number of face images. Face images usually occupy a high-dimensional space and conventional principal component analysis was intractable on such data sets. Turk and Pentland's paper demonstrated ways to extract the eigenvectors based on matrices sized by the number of images rather than the number of pixels. Once established, the eigenface method was expanded to include methods of preprocessing to improve accuracy. Multiple manifold approaches were also used to build sets of eigenfaces for different subjects and different features, such as the eyes. == Generation == A set of eigenfaces can be generated by performing a mathematical process called principal component analysis (PCA) on a large set of images depicting different human faces. Informally, eigenfaces can be considered a set of "standardized face ingredients", derived from statistical analysis of many pictures of faces. Any human face can be considered to be a combination of these standard faces. For example, one's face might be composed of the average face plus 10% from eigenface 1, 55% from eigenface 2, and even −3% from eigenface 3. Remarkably, it does not take many eigenfaces combined together to achieve a fair approximation of most faces. Also, because a person's face is not recorded by a digital photograph, but instead as just a list of values (one value for each eigenface in the database used), much less space is taken for each person's face. The eigenfaces that are created will appear as light and dark areas that are arranged in a specific pattern. This pattern is how different features of a face are singled out to be evaluated and scored. There will be a pattern to evaluate symmetry, whether there is any style of facial hair, where the hairline is, or an evaluation of the size of the nose or mouth. Other eigenfaces have patterns that are less simple to identify, and the image of the eigenface may look very little like a face. The technique used in creating eigenfaces and using them for recognition is also used outside of face recognition: handwriting recognition, lip reading, voice recognition, sign language/hand gestures interpretation and medical imaging analysis. Therefore, some do not use the term eigenface, but prefer to use 'eigenimage'. === Practical implementation === To create a set of eigenfaces, one must: Prepare a training set of face images. The pictures constituting the training set should have been taken under the same lighting conditions, and must be normalized to have the eyes and mouths aligned across all images. They must also be all resampled to a common pixel resolution (r × c). Each image is treated as one vector, simply by concatenating the rows of pixels in the original image, resulting in a single column with r × c elements. For this implementation, it is assumed that all images of the training set are stored in a single matrix T, where each column of the matrix is an image. Subtract the mean. The average image a has to be calculated and then subtracted from each original image in T. Calculate the eigenvectors and eigenvalues of the covariance matrix S. Each eigenvector has the same dimensionality (number of components) as the original images, and thus can itself be seen as an image. The eigenvectors of this covariance matrix are therefore called eigenfaces. They are the directions in which the images differ from the mean image. Usually this will be a computationally expensive step (if at all possible), but the practical applicability of eigenfaces stems from the possibility to compute the eigenvectors of S efficiently, without ever computing S explicitly, as detailed below. Choose the principal components. Sort the eigenvalues in descending order and arrange eigenvectors accordingly. The number of principal components k is determined arbitrarily by setting a threshold ε on the total variance. Total variance ⁠ v = ( λ 1 + λ 2 + . . . + λ n ) {\displaystyle v=(\lambda _{1}+\lambda _{2}+...+\lambda _{n})} ⁠, n = number of components, and λ {\displaystyle \lambda } represents component eigenvalue. k is the smallest number that satisfies ( λ 1 + λ 2 + . . . + λ k ) v > ϵ {\displaystyle {\frac {(\lambda _{1}+\lambda _{2}+...+\lambda _{k})}{v}}>\epsilon } These eigenfaces can now be used to represent both existing and new faces: we can project a new (mean-subtracted) image on the eigenfaces and thereby record how that new face differs from the mean face. The eigenvalues associated with each eigenface represent how much the images in the training set vary from the mean image in that direction. Information is lost by projecting the image on a subset of the eigenvectors, but losses are minimized by keeping those eigenfaces with the largest eigenvalues. For instance, working with a 100 × 100 image will produce 10,000 eigenvectors. In practical applications, most faces can typically be identified using a projection on between 100 and 150 eigenfaces, so that most of the 10,000 eigenvectors can be discarded. === Matlab example code === Here is an example of calculating eigenfaces with Extended Yale Face Database B. To evade computational and storage bottleneck, the face images are sampled down by a factor 4×4=16. Note that although the covariance matrix S generates many eigenfaces, only a fraction of those are needed to represent the majority of the faces. For example, to represent 95% of the total variation of all face images, only the first 43 eigenfaces are needed. To calculate this result, implement the following code: === Computing the eigenvectors === Performing PCA directly on the covariance matrix of the images is often computationally infeasible. If small images are used, say 100 × 100 pixels, each image is a point in a 10,000-dimensional space and the covariance matrix S is a matrix of 10,000 × 10,000 = 108 elements. However the rank of the covariance matrix is limited by the number of training examples: if there are N training examples, there will be at most N − 1 eigenvectors with non-zero eigenvalues. If the number of training examples is smaller than the dimensionality of the images, the principal components can be computed more easily as follows. Let T be the matrix of preprocessed training examples, where each column contains one mean-subtracted image. The covariance matrix can then be computed as S = TTT and the eigenvector decomposition of S is given by S v i = T T T v i = λ i v i {\displaystyle \mathbf {Sv} _{i}=\mathbf {T} \mathbf {T} ^{T}\mathbf {v} _{i}=\lambda _{i}\mathbf {v} _{i}} However TTT is a large matrix, and if instead we take the eigenvalue decomposition of T T T u i = λ i u i {\displaystyle \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {u} _{i}} then we notice that by pre-multiplying both sides of the equation with T, we obtain T T T T u i = λ i T u i {\displaystyle \mathbf {T} \mathbf {T} ^{T}\mathbf {T} \mathbf {u} _{i}=\lambda _{i}\mathbf {T} \mathbf {u} _{i}} Meaning that, if ui is an eigenvector of TTT, then vi = Tui is an eigenvector of S. If we have

    Read more →
  • Mobile cloud storage

    Mobile cloud storage

    Mobile cloud storage is a form of cloud storage that is accessible on mobile devices such as laptops, tablets, and smartphones. Mobile cloud storage providers offer services that allow the user to create and organize files, folders, music, and photos, similar to other cloud computing models. Services are used by both individuals and companies. Most cloud file storage providers offer limited free use but charge for additional storage once the free limit is exceeded. These costs are usually charged as a monthly subscription rate and have different rates depending on the amount of storage desired. In 2018, cloud services revenue was about $182.4 billion and in 2022 it is projected to grow to $331.2 billion. The cloud storage industry was projected to grow 17.2 percent in 2019 (Costello, 2019). == History == The concept of cloud computing trace back to 1960s, when the groundwork for modern internet and network technologies was being laid (Human for humans, 2024). One of the pivotal figures in this early period was J.C.R. Licklider, a visionary computer scientist who worked on ARPANET, the precursor to the internet. Licklider's ideas set the stage for the development of distributed computing systems, which are fundamental to cloud computing. Moving into the 1990s, AT&T introduced PersonaLink Services, a more advanced online platform offering electronic mail and online storage. Major turning point in 2006 The launch of Amazon Web Services (AWS) in 2006 marked a major turning point. AWS introduced Amazon S3 (Simple Storage Service), which allowed businesses and developers to store and retrieve any amount of data, at any time, from anywhere on the web. This development was revolutionary, providing scalable, reliable, and low-cost data storage infrastructure that transformed how organizations managed their data. == Applications == Some mobile device manufacturers include mobile cloud storage apps with their product. These apps facilitate synchronization of user files across multiple platforms. Part of the process for setting up new mobile devices frequently includes configuring a cloud storage service to Backup the device's files and information. Apple iOS devices come pre-loaded and configured to use Apple's mobile cloud storage service iCloud. Google offers a similar feature with the Android operating system by backing up the device using a Google Drive account. The Samsung Galaxy smartphone has partnered with Dropbox, while Microsoft similarly offers Microsoft OneDrive. Some mobile cloud storage apps are platform-independent. For example, Nasuni's Mobile Access app is available on any Android or iOS device. Most companies offering Cloud Storage have secure website to access files allowing use on any device that can browse the Internet.

    Read more →
  • Visual servoing

    Visual servoing

    Visual servoing, also known as vision-based robot control and abbreviated VS, is a technique which uses feedback information extracted from a vision sensor (visual feedback) to control the motion of a robot. One of the earliest papers that talks about visual servoing was from the SRI International Labs in 1979. == Visual servoing taxonomy == There are two fundamental configurations of the robot end-effector (hand) and the camera: Eye-in-hand, or end-point open-loop control, where the camera is attached to the moving hand and observing the relative position of the target. Eye-to-hand, or end-point closed-loop control, where the camera is fixed in the world and observing the target and the motion of the hand. Visual Servoing control techniques are broadly classified into the following types: Image-based (IBVS) Position/pose-based (PBVS) Hybrid approach IBVS was proposed by Weiss and Sanderson. The control law is based on the error between current and desired features on the image plane, and does not involve any estimate of the pose of the target. The features may be the coordinates of visual features, lines or moments of regions. IBVS has difficulties with motions very large rotations, which has come to be called camera retreat. PBVS is a model-based technique (with a single camera). This is because the pose of the object of interest is estimated with respect to the camera and then a command is issued to the robot controller, which in turn controls the robot. In this case the image features are extracted as well, but are additionally used to estimate 3D information (pose of the object in Cartesian space), hence it is servoing in 3D. Hybrid approaches use some combination of the 2D and 3D servoing. There have been a few different approaches to hybrid servoing 2-1/2-D Servoing Motion partition-based Partitioned DOF Based == Survey == The following description of the prior work is divided into 3 parts Survey of existing visual servoing methods. Various features used and their impacts on visual servoing. Error and stability analysis of visual servoing schemes. === Survey of existing visual servoing methods === Visual servo systems, also called servoing, have been around since the early 1980s , although the term visual servo itself was only coined in 1987. Visual Servoing is, in essence, a method for robot control where the sensor used is a camera (visual sensor). Servoing consists primarily of two techniques, one involves using information from the image to directly control the degrees of freedom (DOF) of the robot, thus referred to as Image Based Visual Servoing (IBVS). While the other involves the geometric interpretation of the information extracted from the camera, such as estimating the pose of the target and parameters of the camera (assuming some basic model of the target is known). Other servoing classifications exist based on the variations in each component of a servoing system , e.g. the location of the camera, the two kinds are eye-in-hand and hand–eye configurations. Based on the control loop, the two kinds are end-point-open-loop and end-point-closed-loop. Based on whether the control is applied to the joints (or DOF) directly or as a position command to a robot controller the two types are direct servoing and dynamic look-and-move. Being one of the earliest works the authors proposed a hierarchical visual servo scheme applied to image-based servoing. The technique relies on the assumption that a good set of features can be extracted from the object of interest (e.g. edges, corners and centroids) and used as a partial model along with global models of the scene and robot. The control strategy is applied to a simulation of a two and three DOF robot arm. Feddema et al. introduced the idea of generating task trajectory with respect to the feature velocity. This is to ensure that the sensors are not rendered ineffective (stopping the feedback) for any the robot motions. The authors assume that the objects are known a priori (e.g. CAD model) and all the features can be extracted from the object. The work by Espiau et al. discusses some of the basic questions in visual servoing. The discussions concentrate on modeling of the interaction matrix, camera, visual features (points, lines, etc..). In an adaptive servoing system was proposed with a look-and-move servoing architecture. The method used optical flow along with SSD to provide a confidence metric and a stochastic controller with Kalman filtering for the control scheme. The system assumes (in the examples) that the plane of the camera and the plane of the features are parallel., discusses an approach of velocity control using the Jacobian relationship s˙ = Jv˙ . In addition the author uses Kalman filtering, assuming that the extracted position of the target have inherent errors (sensor errors). A model of the target velocity is developed and used as a feed-forward input in the control loop. Also, mentions the importance of looking into kinematic discrepancy, dynamic effects, repeatability, settling time oscillations and lag in response. Corke poses a set of very critical questions on visual servoing and tries to elaborate on their implications. The paper primarily focuses the dynamics of visual servoing. The author tries to address problems like lag and stability, while also talking about feed-forward paths in the control loop. The paper also, tries to seek justification for trajectory generation, methodology of axis control and development of performance metrics. Chaumette in provides good insight into the two major problems with IBVS. One, servoing to a local minima and second, reaching a Jacobian singularity. The author show that image points alone do not make good features due to the occurrence of singularities. The paper continues, by discussing the possible additional checks to prevent singularities namely, condition numbers of J_s and Jˆ+_s, to check the null space of ˆ J_s and J^T_s . One main point that the author highlights is the relation between local minima and unrealizable image feature motions. Over the years many hybrid techniques have been developed. These involve computing partial/complete pose from Epipolar Geometry using multiple views or multiple cameras. The values are obtained by direct estimation or through a learning or a statistical scheme. While others have used a switching approach that changes between image-based and position-based on a Lyapnov function. The early hybrid techniques that used a combination of image-based and pose-based (2D and 3D information) approaches for servoing required either a full or partial model of the object in order to extract the pose information and used a variety of techniques to extract the motion information from the image. used an affine motion model from the image motion in addition to a rough polyhedral CAD model to extract the object pose with respect to the camera to be able to servo onto the object (on the lines of PBVS). 2-1/2-D visual servoing developed by Malis et al. is a well known technique that breaks down the information required for servoing into an organized fashion which decouples rotations and translations. The papers assume that the desired pose is known a priori. The rotational information is obtained from partial pose estimation, a homography, (essentially 3D information) giving an axis of rotation and the angle (by computing the eigenvalues and eigenvectors of the homography). The translational information is obtained from the image directly by tracking a set of feature points. The only conditions being that the feature points being tracked never leave the field of view and that a depth estimate be predetermined by some off-line technique. 2-1/2-D servoing has been shown to be more stable than the techniques that preceded it. Another interesting observation with this formulation is that the authors claim that the visual Jacobian will have no singularities during the motions. The hybrid technique developed by Corke and Hutchinson, popularly called portioned approach partitions the visual (or image) Jacobian into motions (both rotations and translations) relating X and Y axes and motions related to the Z axis. outlines the technique, to break out columns of the visual Jacobian that correspond to the Z axis translation and rotation (namely, the third and sixth columns). The partitioned approach is shown to handle the Chaumette Conundrum discussed in. This technique requires a good depth estimate in order to function properly. outlines a hybrid approach where the servoing task is split into two, namely main and secondary. The main task is keep the features of interest within the field of view. While the secondary task is to mark a fixation point and use it as a reference to bring the camera to the desired pose. The technique does need a depth estimate from an off-line procedure. The paper discusses two examples for which depth estimates are obtained from robot odometry and by assuming that all

    Read more →
  • Eigenmoments

    Eigenmoments

    EigenMoments is a set of orthogonal, noise robust, invariant to rotation, scaling and translation and distribution sensitive moments. Their application can be found in signal processing and computer vision as descriptors of the signal or image. The descriptors can later be used for classification purposes. It is obtained by performing orthogonalization, via eigen analysis on geometric moments. == Framework summary == EigenMoments are computed by performing eigen analysis on the moment space of an image by maximizing signal-to-noise ratio in the feature space in form of Rayleigh quotient. This approach has several benefits in Image processing applications: Dependency of moments in the moment space on the distribution of the images being transformed, ensures decorrelation of the final feature space after eigen analysis on the moment space. The ability of EigenMoments to take into account distribution of the image makes it more versatile and adaptable for different genres. Generated moment kernels are orthogonal and therefore analysis on the moment space becomes easier. Transformation with orthogonal moment kernels into moment space is analogous to projection of the image onto a number of orthogonal axes. Nosiy components can be removed. This makes EigenMoments robust for classification applications. Optimal information compaction can be obtained and therefore a few number of moments are needed to characterize the images. == Problem formulation == Assume that a signal vector s ∈ R n {\displaystyle s\in {\mathcal {R}}^{n}} is taken from a certain distribution having correlation C ∈ R n × n {\displaystyle C\in {\mathcal {R}}^{n\times n}} , i.e. C = E [ s s T ] {\displaystyle C=E[ss^{T}]} where E[.] denotes expected value. Dimension of signal space, n, is often too large to be useful for practical application such as pattern classification, we need to transform the signal space into a space with lower dimensionality. This is performed by a two-step linear transformation: q = W T X T s , {\displaystyle q=W^{T}X^{T}s,} where q = [ q 1 , . . . , q n ] T ∈ R k {\displaystyle q=[q_{1},...,q_{n}]^{T}\in {\mathcal {R}}^{k}} is the transformed signal, X = [ x 1 , . . . , x n ] T ∈ R n × m {\displaystyle X=[x_{1},...,x_{n}]^{T}\in {\mathcal {R}}^{n\times m}} a fixed transformation matrix which transforms the signal into the moment space, and W = [ w 1 , . . . , w n ] T ∈ R m × k {\displaystyle W=[w_{1},...,w_{n}]^{T}\in {\mathcal {R}}^{m\times k}} the transformation matrix which we are going to determine by maximizing the SNR of the feature space resided by q {\displaystyle q} . For the case of Geometric Moments, X would be the monomials. If m = k = n {\displaystyle m=k=n} , a full rank transformation would result, however usually we have m ≤ n {\displaystyle m\leq n} and k ≤ m {\displaystyle k\leq m} . This is specially the case when n {\displaystyle n} is of high dimensions. Finding W {\displaystyle W} that maximizes the SNR of the feature space: S N R t r a n s f o r m = w T X T C X w w T X T N X w , {\displaystyle SNR_{transform}={\frac {w^{T}X^{T}CXw}{w^{T}X^{T}NXw}},} where N is the correlation matrix of the noise signal. The problem can thus be formulated as w 1 , . . . , w k = a r g m a x w w T X T C X w w T X T N X w {\displaystyle {w_{1},...,w_{k}}=argmax_{w}{\frac {w^{T}X^{T}CXw}{w^{T}X^{T}NXw}}} subject to constraints: w i T X T N X w j = δ i j , {\displaystyle w_{i}^{T}X^{T}NXw_{j}=\delta _{ij},} where δ i j {\displaystyle \delta _{ij}} is the Kronecker delta. It can be observed that this maximization is Rayleigh quotient by letting A = X T C X {\displaystyle A=X^{T}CX} and B = X T N X {\displaystyle B=X^{T}NX} and therefore can be written as: w 1 , . . . , w k = a r g m a x x w T A w w T B w {\displaystyle {w_{1},...,w_{k}}={\underset {x}{\operatorname {arg\,max} }}{\frac {w^{T}Aw}{w^{T}Bw}}} , w i T B w j = δ i j {\displaystyle w_{i}^{T}Bw_{j}=\delta _{ij}} === Rayleigh quotient === Optimization of Rayleigh quotient has the form: max w R ( w ) = max w w T A w w T B w {\displaystyle \max _{w}R(w)=\max _{w}{\frac {w^{T}Aw}{w^{T}Bw}}} and A {\displaystyle A} and B {\displaystyle B} , both are symmetric and B {\displaystyle B} is positive definite and therefore invertible. Scaling w {\displaystyle w} does not change the value of the object function and hence and additional scalar constraint w T B w = 1 {\displaystyle w^{T}Bw=1} can be imposed on w {\displaystyle w} and no solution would be lost when the objective function is optimized. This constraint optimization problem can be solved using Lagrangian multiplier: max w w T A w {\displaystyle \max _{w}{w^{T}Aw}} subject to w T B w = 1 {\displaystyle {w^{T}Bw}=1} max w L ( w ) = max w ( w T A w − λ w T B w ) {\displaystyle \max _{w}{\mathcal {L}}(w)=\max _{w}(w{T}Aw-\lambda w^{T}Bw)} equating first derivative to zero and we will have: A w = λ B w {\displaystyle Aw=\lambda Bw} which is an instance of Generalized Eigenvalue Problem (GEP). The GEP has the form: A w = λ B w {\displaystyle Aw=\lambda Bw} for any pair ( w , λ ) {\displaystyle (w,\lambda )} that is a solution to above equation, w {\displaystyle w} is called a generalized eigenvector and λ {\displaystyle \lambda } is called a generalized eigenvalue. Finding w {\displaystyle w} and λ {\displaystyle \lambda } that satisfies this equations would produce the result which optimizes Rayleigh quotient. One way of maximizing Rayleigh quotient is through solving the Generalized Eigen Problem. Dimension reduction can be performed by simply choosing the first components w i {\displaystyle w_{i}} , i = 1 , . . . , k {\displaystyle i=1,...,k} , with the highest values for R ( w ) {\displaystyle R(w)} out of the m {\displaystyle m} components, and discard the rest. Interpretation of this transformation is rotating and scaling the moment space, transforming it into a feature space with maximized SNR and therefore, the first k {\displaystyle k} components are the components with highest k {\displaystyle k} SNR values. The other method to look at this solution is to use the concept of simultaneous diagonalization instead of Generalized Eigen Problem. === Simultaneous diagonalization === Let A = X T C X {\displaystyle A=X^{T}CX} and B = X T N X {\displaystyle B=X^{T}NX} as mentioned earlier. We can write W {\displaystyle W} as two separate transformation matrices: W = W 1 W 2 . {\displaystyle W=W_{1}W_{2}.} W 1 {\displaystyle W_{1}} can be found by first diagonalize B: P T B P = D B {\displaystyle P^{T}BP=D_{B}} . Where D B {\displaystyle D_{B}} is a diagonal matrix sorted in increasing order. Since B {\displaystyle B} is positive definite, thus D B > 0 {\displaystyle D_{B}>0} . We can discard those eigenvalues that large and retain those close to 0, since this means the energy of the noise is close to 0 in this space, at this stage it is also possible to discard those eigenvectors that have large eigenvalues. Let P ^ {\displaystyle {\hat {P}}} be the first k {\displaystyle k} columns of P {\displaystyle P} , now P T ^ B P ^ = D B ^ {\displaystyle {\hat {P^{T}}}B{\hat {P}}={\hat {D_{B}}}} where D B ^ {\displaystyle {\hat {D_{B}}}} is the k × k {\displaystyle k\times k} principal submatrix of D B {\displaystyle D_{B}} . Let W 1 = P ^ D B ^ − 1 / 2 {\displaystyle W_{1}={\hat {P}}{\hat {D_{B}}}^{-1/2}} and hence: W 1 T B W 1 = ( P ^ D B ^ − 1 / 2 ) T B ( P ^ D B ^ − 1 / 2 ) = I {\displaystyle W_{1}^{T}BW_{1}=({\hat {P}}{\hat {D_{B}}}^{-1/2})^{T}B({\hat {P}}{\hat {D_{B}}}^{-1/2})=I} . W 1 {\displaystyle W_{1}} whiten B {\displaystyle B} and reduces the dimensionality from m {\displaystyle m} to k {\displaystyle k} . The transformed space resided by q ′ = W 1 T X T s {\displaystyle q'=W_{1}^{T}X^{T}s} is called the noise space. Then, we diagonalize W 1 T A W 1 {\displaystyle W_{1}^{T}AW_{1}} : W 2 T W 1 T A W 1 W 2 = D A {\displaystyle W_{2}^{T}W_{1}^{T}AW_{1}W_{2}=D_{A}} , where W 2 T W 2 = I {\displaystyle W_{2}^{T}W_{2}=I} . D A {\displaystyle D_{A}} is the matrix with eigenvalues of W 1 T A W 1 {\displaystyle W_{1}^{T}AW_{1}} on its diagonal. We may retain all the eigenvalues and their corresponding eigenvectors since most of the noise are already discarded in previous step. Finally the transformation is given by: W = W 1 W 2 {\displaystyle W=W_{1}W_{2}} where W {\displaystyle W} diagonalizes both the numerator and denominator of the SNR, W T A W = D A {\displaystyle W^{T}AW=D_{A}} , W T B W = I {\displaystyle W^{T}BW=I} and the transformation of signal s {\displaystyle s} is defined as q = W T X T s = W 2 T W 1 T X T s {\displaystyle q=W^{T}X^{T}s=W_{2}^{T}W_{1}^{T}X^{T}s} . === Information loss === To find the information loss when we discard some of the eigenvalues and eigenvectors we can perform following analysis: η = 1 − t r a c e ( W 1 T A W 1 ) t r a c e ( D B − 1 / 2 P T A P D B − 1 / 2 ) = 1 − t r a c e ( D B ^ − 1 / 2 P ^ T A P ^ D B ^ − 1 / 2 ) t r a c e ( D B − 1 / 2 P T A P D B − 1 / 2 ) {\displaystyle {\begin{array}{lll}\eta &=&

    Read more →
  • Screen generator

    Screen generator

    A screen generator, also known as a screen painter, screen mapper, or forms generator is a software package (or component thereof) which enables data entry screens to be generated declaratively, by "painting" them on the screen WYSIWYG-style, or through filling-in forms, rather than requiring writing of code to display them manually. 4GLs commonly incorporate a screen generator feature. They are also commonly found bundled with database systems, especially entry-level databases. A screen generator is one aspect of an application generator, which can also include other functions such as report generation and a data dictionary. The earliest screen generators were character-based; by the 1990s, GUI support became common, and then support for generating HTML forms as well. Some screen generators work by generating code to display the screen in a high-level language (for example, COBOL); others store the screen definition in a data file or in database tables, and then have a runtime component responsible for actually displaying the form and receiving and validating user input. == Examples == Examples of screen generators include: IBM Screen Definition Facility II: generates screens for CICS BMS, IMS MFS, ISPF, GDDM and CSP/AD. Performix for Informix. Microsoft Visual Basic the forms component of Microsoft Access Oracle Developer, in particular its Oracle Forms component the QDesign component of PowerHouse SystemBuilder/SB+ the Screen Painter component of SAP's ABAP Workbench the FoxView component of FoxPro. FoxView was originally developed by Luis Castro as a dBASE screen generator named ViewGen; Fox purchased it and bundled it with FoxPro 1.0. Later, Fox replaced Castro's code with their own screen painter code. dBASE included a built-in screen generator in dBASE IV onwards; in dBASE III and earlier, third party screen generators were available, including the already mentioned ViewGen DPS 1100 for UNIVAC 1100 series mainframes.

    Read more →
  • CMU Pronouncing Dictionary

    CMU Pronouncing Dictionary

    The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research. CMUdict provides a mapping orthographic/phonetic for English words in their North American pronunciations. It is commonly used to generate representations for speech recognition (ASR), e.g. the CMU Sphinx system, and speech synthesis (TTS), e.g. the Festival system. CMUdict can be used as a training corpus for building statistical grapheme-to-phoneme (g2p) models that will generate pronunciations for words not yet included in the dictionary. The most recent release is 0.7b; it contains over 134,000 entries. An interactive lookup version is available. == Database format == The database is distributed as a plain text file with one entry to a line in the format "WORD " with a two-space separator between the parts. If multiple pronunciations are available for a word, variants are identified using numbered versions (e.g. WORD(1)). The pronunciation is encoded using a modified form of the ARPABET system, with the addition of stress marks on vowels of levels 0, 1, and 2. A line-initial ;;; token indicates a comment. A derived format, directly suitable for speech recognition engines is also available as part of the distribution; this format collapses stress distinctions (typically not used in ASR). The following is a table of phonemes used by CMU Pronouncing Dictionary. == History == == Applications == The Unifon converter is based on the CMU Pronouncing Dictionary. The Natural Language Toolkit contains an interface to the CMU Pronouncing Dictionary. The Carnegie Mellon Logios tool incorporates the CMU Pronouncing Dictionary. PronunDict, a pronunciation dictionary of American English, uses the CMU Pronouncing Dictionary as its data source. Pronunciation is transcribed in IPA symbols. This dictionary also supports searching by pronunciation. Some singing voice synthesizer software like CeVIO Creative Studio and Synthesizer V uses modified version of CMU Pronouncing Dictionary for synthesizing English singing voices. Transcriber, a tool for the full text phonetic transcription, uses the CMU Pronouncing Dictionary 15.ai, a real-time text-to-speech tool using artificial intelligence, uses the CMU Pronouncing Dictionary

    Read more →
  • Apache OpenNLP

    Apache OpenNLP

    The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as language detection, tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing and coreference resolution. These tasks are usually required to build more advanced text processing services.

    Read more →