AI For Business Guide

AI For Business Guide — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Brave Leo

    Brave Leo

    Brave Leo is a large language model-based chatbot developed by Brave Software and included with the Brave browser. == History == In November 2023, the company said versions for iOS and Android would be available "in the coming months". == Features == Since January 2024, Leo has used the open-source Mixtral 8x7B from Mistral AI as its default large language model, in addition to LLaMA 2 from Meta Platforms and Claude from Anthropic, both of which have been used previously. Leo can suggest follow-up questions, and summarize webpages, PDFs, and videos. Leo has a $15 (US) per month premium version that enables more requests and uses larger LLMs. == Privacy == The answers given by Leo are not saved. Brave uses the slogan Love Privacy to emphasize its focus on user privacy and data protection. The phrase has been featured in Brave's official marketing campaigns and has been cited in media coverage of the browser's privacy-first approach. == Controversies == In 2023, PC World reported that Leo evades questions about US elections.

    Read more →
  • Dental AI

    Dental AI

    Dental artificial intelligence (Dental AI) refers to the application of artificial intelligence (AI) and machine-learning methods to oral healthcare data. These systems can be used to find patterns or make predictions that can aid in diagnosis, treatment, patient communication, or practice management. == History and development == Research into AI for dentistry dates to the 1990s and 2000s, alongside early CAD/CAM and image-analysis work in dental radiology. Recent developments in deep learning, especially those involving computer vision, such as convolutional neural networks, trained on large image datasets, led to a rapid improvement in performance, as well as a move from prototype technology to productization suitable for use in dental chairs. Dental schools and continuing education programs started incorporating AI content in the 2020s. == Definition and core technologies == The dental AI software accomplishes this task by using various dental images and patient data. Dental images and data used by the dental AI software include bitewing and periapical X-rays, complete mouth X-rays, detailed 3D images, intraoral images, and the patient’s medical history. The dental AI software utilizes several core technologies in accomplishing its task of assisting the dentist. First, the dental AI software utilizes machine learning and deep learning using programs that can learn from examples. Such programs are referred to as convolutional neural network (CNN) and can detect cavities and identify bone changes related to gum disease. The dental AI software utilizes computer vision, which enables the AI software to identify and quantify important features in images and data, whether they are 2D images or 3D images. Natural language processing (NLP) is used for the AI software to understand written text and can automatically generate dental notes and communicate with the patient. Furthermore, the dental AI software utilizes predictive analytics to identify patients that are more prone to dental complications and can suggest the best intervals for checkups or future dental procedures. == Applications in dentistry == Reported clinical and operational applications include diagnostic assistance for caries and periodontal disease, treatment planning assistance, patient education overlays, quality assurance, curriculum assistance for dental education, and claims documentation. Systematic reviews continue to find image-based applications such as caries detection with some variability in study design and a need for prospective validation. == Academic research and clinical validation == Several peer-reviewed studies have measured the effectiveness of AI for applications such as interproximal caries detection and periodontal bone level assessment, showing improvements over unaided readings with a focus on bias within the dataset. The Dental AI Council found variability among clinicians for diagnosis and treatment planning, suggesting the use of a standard tool as an assist. == Industry adoption == Multiple vendors offer FDA-cleared chairside AI for dental imaging: Pearl — Received U.S. FDA 510(k) clearance for its real-time radiologic aid (“Second Opinion”) in 2022 (2D), with subsequent clearances including pediatric and CBCT (“Second Opinion 3D”). TIME gave “Second Opinion” a special mention on its Best Inventions of 2022 list. Overjet — FDA-cleared for bone-level quantification and detection/outline of caries and calculus (e.g., K210187), with additional clearances expanding capabilities. VideaHealth — Received an FDA 510(k) covering 30+ detections across common dental findings (K232384), including indications for patients ages 3 and up; trade coverage has described elements of this as the first pediatric dental-AI clearance. == Regulations == In the U.S., AI-enabled dental imaging software is generally reviewed via the FDA’s 510(k) pathway. The FDA maintains a public AI-Enabled Medical Devices List, which includes numerous medical-imaging AI tools (including dental). Specific dental clearances include Overjet (K210187), VideaHealth (K232384), and Pearl entries such as “Second Opinion 3D” (K243989).

    Read more →
  • Abdul Majid Bhurgri Institute of Language Engineering

    Abdul Majid Bhurgri Institute of Language Engineering

    Abdul Majid Bhurgri Institute of Language Engineering (Sindhi: عبدالماجد ڀرڳڙي انسٽيٽيوٽ آف لئنگئيج انجنيئرنگ) is an autonomous body under the administrative control of the Culture, Tourism and Antiquities Department, Government of Sindh established for bringing Sindhi language at par with national and international languages in all computational process and Natural language processing. == Establishment == In recognition to services of Abdul-Majid Bhurgri, who is the founder of Sindhi computing, Government of Sindh has established the institute after his name. The institute was primarily initiated on the concept given by a language engineer and linguist Amar Fayaz Buriro in briefing to the Minister, Culture, Tourism and Antiquities, Government of Sindh, Syed Sardar Ali Shah on 21 February 2017 on celebration of International Mother Language Day in Sindhi Language Authority, Hyderabad, Sindh. After the presentation and concept given by Amar Fayaz Buriro, the minister Syed Sardar Ali Shah had announced the Institute. Then, Government of Sindh added the development scheme in the Budget of fiscal year 2017-2018. == Projects == The Institute has developed several projects aimed at advancing the Sindhi language and promoting linguistic research. Notable initiatives include the AMBILE Hamiz Ali Sindhi Optical character recognition, which allows for the accurate digitization of Sindhi text, and the ongoing Sindhi WordNet System, a project to build a comprehensive lexical database for Natural language processing. The institute has also created the Font, which integrates symbols from the Indus script, Khudabadi script, and modern Perso-Arabic Script Code for Information Interchange into a single resource for researchers]. Additionally, institute has developed online converter tools that automatically transliterate between the Arabic-Perso script and Devanagari script, improving linguistic accessibility. Another key project is Bhittaipedia, a digital platform dedicated to the preservation and dissemination of the poetry of Shah Abdul Latif Bhittai, one of Sindh's most renowned poet. == Location == The institute is established behind Sindh Museum and Sindhi Language Authority, N-5 National Highway, Qasimabad, Hyderabad, Sindh.

    Read more →
  • Arabic Ontology

    Arabic Ontology

    Arabic Ontology is a website offering linguistic ontology services for the Arabic language which can be used like the online site WordNet. Users can use Arabic Ontology to classify or clarify the concepts and meanings of Arabic terms. == Ontology Structure == The ontology structure (i.e., data model) is similar to WordNet's structure. Each concept in the database is given a unique concept identifier (URI), informally described by a gloss, and lexicalized by one or more synonymous lemma terms. Each term-concept pair is called a sense, and is given a SenseID. A set of senses is called synset. Concepts and senses are described by further attributes such as era and area — to specify example usage and ontological analysis. Semantic relations are defined between concepts. Some important entities are included in the ontology, such as individual countries and bodies of water. These individuals are given separate IndividualIDs and linked with their concepts through the InstanceOf relation. == Mappings to other resources == Concepts in the Arabic Ontology are mapped to synsets in WordNet, as well as to BFO and DOLCE. Terms used in the Arabic Ontology are mapped to lemmas in the LDC's SAMA database. == Applications == Arabic Ontology can be used in many application domains, such as: Information retrieval, to enrich queries (e.g., in search engines) and improve the quality of the results, i.e. meaningful search rather than string-matching search; Machine translation and word-sense disambiguation, by finding the exact mapping of concepts across languages, especially that the Arabic ontology is also mapped to the WordNet; Data Integration and interoperability in which the Arabic ontology can be used as a semantic reference to link databases and information systems; Semantic Web and Web 3.0, by using the Arabic ontology as a semantic reference to disambiguate the meanings used in websites; among many other applications. == URLs Design == The URLs in the Arabic Ontology are designed according to the W3C's Best Practices for Publishing Linked Data, as described in the following URL schemes. This allows one to also explore the whole database like exploring a graph: Ontology Concept: Each concept in the Arabic Ontology has a ConceptID and can be accessed using: https://{domain}/concept/{ConceptID | Term}. In case of a term, the set of concepts that this term lexicalizes are all retrieved. In case of a ConceptID, the concept and its direct subtypes are retrieved, e.g. https://ontology.birzeit.edu/concept/293198 Semantic relations: Relationships between concepts can be accessed using these schemes: (i) the URL: https:// {domain}/concept/{RelationName}/{ConceptID} allows retrieval of relationships among ontology concepts. (ii) the URL: https://{domain}/lexicalconcept/{RelationName}/{lexicalConceptID} allows retrieval of relations between lexical concepts. For example, https://ontology.birzeit.edu/concept/instances/293121 retrieves the instances of the concept 293121. The relations that are currently used in our database are: {subtypes, type, instances, parts, related, similar, equivalent}.

    Read more →
  • Common data model

    Common data model

    A common data model (CDM) can refer to any standardised data model which allows for data and information exchange between different applications and data sources. Common data models aim to standardise logical infrastructure so that related applications can "operate on and share the same data", and can be seen as a way to "organize data from many sources that are in different formats into a standard structure". A common data model has been described as one of the components of a "strong information system". A standardised common data model has also been described as a typical component of a well designed agile application besides a common communication protocol. Providing a single common data model within an organisation is one of the typical tasks of a data warehouse. == Examples of common data models == === Border crossings === X-trans.eu was a cross-border pilot project between the Free State of Bavaria (Germany) and Upper Austria with the aim of developing a faster procedure for the application and approval of cross-border large-capacity transports. The portal was based on a common data model that contained all the information required for approval. === Climate data === The Climate Data Store Common Data Model is a common data model set up by the Copernicus Climate Change Service for harmonising essential climate variables from different sources and data providers. === General information technology === Within service-oriented architecture, S-RAMP is a specification released by HP, IBM, Software AG, TIBCO, and Red Hat which defines a common data model for SOA repositories as well as an interaction protocol to facilitate the use of common tooling and sharing of data. Content Management Interoperability Services (CMIS) is an open standard for inter-operation of different content management systems over the internet, and provides a common data model for typed files and folders used with version control. The NetCDF software libraries for array-oriented scientific data implements a common data model called the NetCDF Java common data model, which consists of three layers built on top of each other to add successively richer semantics. === Health === Within genomic and medical data, the Observational Medical Outcomes Partnership (OMOP) research program established under the U.S. National Institutes of Health has created a common data model for claims and electronic health records which can accommodate data from different sources around the world. PCORnet, which was developed by the Patient-Centered Outcomes Research Institute, is another common data model for health data including electronic health records and patient claims. The Sentinel Common Data Model was initially started as Mini-Sentinel in 2008. It is used by the Sentinel Initiative of the USA's Food and Drug Administration. The Generalized Data Model was first published in 2019. It was designed to be a stand-alone data model as well as to allow for further transformation into other data models (e.g., OMOP, PCORNet, Sentinel). It has a hierarchical structure to flexibly capture relationships among data elements. The JANUS clinical trial data repository also provides a common data model which is based on the SDTM standard to represent clinical data submitted to regulatory agencies, such as tabulation datasets, patient profiles, listings, etc. === Logistics === SX000i is a specification developed jointly by the Aerospace and Defence Industries Association of Europe (ASD) and the American Aerospace Industries Association (AIA) to provide information, guidance and instructions to ensure compatibility and the commonality. The associated SX002D specification contains a common data model. === Microsoft Common Data Model === The Microsoft Common Data Model is a collection of many standardised extensible data schemas with entities, attributes, semantic metadata, and relationships, which represent commonly used concepts and activities in various businesses areas. It is maintained by Microsoft and its partners, and is published on GitHub. Microsoft's Common Data Model is used amongst others in Microsoft Dataverse and with various Microsoft Power Platform and Microsoft Dynamics 365 services. === Rail transport === RailTopoModel is a common data model for the railway sector. === Other === There are many more examples of various common data models for different uses published by different sources.

    Read more →
  • Neural radiance field

    Neural radiance field

    A neural radiance field (NeRF) is a neural field for reconstructing a three-dimensional representation of a scene from two-dimensional images. The NeRF model enables downstream applications of novel view synthesis, scene geometry reconstruction, and obtaining the reflectance properties of the scene. Additional scene properties such as camera poses may also be jointly learned. First introduced in 2020, it has since gained significant attention for its potential applications in computer graphics and content creation. == Algorithm == The NeRF algorithm represents a scene as a radiance field parametrized by a deep neural network (DNN). The network predicts a volume density and view-dependent emitted radiance given the spatial location ( x , y , z ) {\displaystyle (x,y,z)} and viewing direction in Euler angles ( θ , Φ ) {\displaystyle (\theta ,\Phi )} of the camera. By sampling many points along camera rays, traditional volume rendering techniques can produce an image. === Data collection === A NeRF needs to be retrained for each unique scene. The first step is to collect images of the scene from different angles and their respective camera pose. These images are standard 2D images and do not require a specialized camera or software. Any camera is able to generate datasets, provided the settings and capture method meet the requirements for SfM (Structure from Motion). This requires tracking of the camera position and orientation, often through some combination of SLAM, GPS, or inertial estimation. Researchers often use synthetic data to evaluate NeRF and related techniques. For such data, images (rendered through traditional non-learned methods) and respective camera poses are reproducible and error-free. === Training === For each sparse viewpoint (image and camera pose) provided, camera rays are marched through the scene, generating a set of 3D points with a given radiance direction (into the camera). For these points, volume density and emitted radiance are predicted using the multi-layer perceptron (MLP). An image is then generated through classical volume rendering. Because this process is fully differentiable, the error between the predicted image and the original image can be minimized with gradient descent over multiple viewpoints, encouraging the MLP to develop a coherent model of the scene. == Variations and improvements == Early versions of NeRF were slow to optimize and required that all input views were taken with the same camera in the same lighting conditions. These performed best when limited to orbiting around individual objects, such as a drum set, plants or small toys. Since the original paper in 2020, many improvements have been made to the NeRF algorithm, with variations for special use cases. === Fourier feature mapping === In 2020, shortly after the release of NeRF, the addition of Fourier Feature Mapping improved training speed and image accuracy. Deep neural networks struggle to learn high frequency functions in low dimensional domains; a phenomenon known as spectral bias. To overcome this shortcoming, points are mapped to a higher dimensional feature space before being fed into the MLP. γ ( v ) = [ a 1 cos ⁡ ( 2 π B 1 T v ) a 1 sin ⁡ ( 2 π B 1 T v ) ⋮ a m cos ⁡ ( 2 π B m T v ) a m sin ⁡ ( 2 π B m T v ) ] {\displaystyle \gamma (\mathrm {v} )={\begin{bmatrix}a_{1}\cos(2{\pi }{\mathrm {B} }_{1}^{T}\mathrm {v} )\\a_{1}\sin(2\pi {\mathrm {B} }_{1}^{T}\mathrm {v} )\\\vdots \\a_{m}\cos(2{\pi }{\mathrm {B} }_{m}^{T}\mathrm {v} )\\a_{m}\sin(2{\pi }{\mathrm {B} }_{m}^{T}\mathrm {v} )\end{bmatrix}}} Where v {\displaystyle \mathrm {v} } is the input point, B i {\displaystyle \mathrm {B} _{i}} are the frequency vectors, and a i {\displaystyle a_{i}} are coefficients. This allows for rapid convergence to high frequency functions, such as pixels in a detailed image. === Bundle-adjusting neural radiance fields === One limitation of NeRFs is the requirement of knowing accurate camera poses to train the model. Often times, pose estimation methods are not completely accurate, nor is the camera pose even possible to know. These imperfections result in artifacts and suboptimal convergence. So, a method was developed to optimize the camera pose along with the volumetric function itself. Called Bundle-Adjusting Neural Radiance Field (BARF), the technique uses a dynamic low-pass filter (DLPF) to go from coarse to fine adjustment, minimizing error by finding the geometric transformation to the desired image. This corrects imperfect camera poses and greatly improves the quality of NeRF renders. === Multiscale representation === Conventional NeRFs struggle to represent detail at all viewing distances, producing blurry images up close and overly aliased images from distant views. In 2021, researchers introduced a technique to improve the sharpness of details at different viewing scales known as mip-NeRF (comes from mipmap). Rather than sampling a single ray per pixel, the technique fits a gaussian to the conical frustum cast by the camera. This improvement effectively anti-aliases across all viewing scales. mip-NeRF also reduces overall image error and is faster to converge at about half the size of ray-based NeRF. === Learned initializations === In 2021, researchers applied meta-learning to assign initial weights to the MLP. This rapidly speeds up convergence by effectively giving the network a head start in gradient descent. Meta-learning also allowed the MLP to learn an underlying representation of certain scene types. For example, given a dataset of famous tourist landmarks, an initialized NeRF could partially reconstruct a scene given one image. === NeRF in the wild === Conventional NeRFs are vulnerable to slight variations in input images (objects, lighting) often resulting in ghosting and artifacts. As a result, NeRFs struggle to represent dynamic scenes, such as bustling city streets with changes in lighting and dynamic objects. In 2021, researchers at Google developed a new method for accounting for these variations, named NeRF in the Wild (NeRF-W). This method splits the neural network (MLP) into three separate models. The main MLP is retained to encode the static volumetric radiance. However, it operates in sequence with a separate MLP for appearance embedding (changes in lighting, camera properties) and an MLP for transient embedding (changes in scene objects). This allows the NeRF to be trained on diverse photo collections, such as those taken by mobile phones at different times of day. === Relighting === In 2021, researchers added more outputs to the MLP at the heart of NeRFs. The output now included: volume density, surface normal, material parameters, distance to the first surface intersection (in any direction), and visibility of the external environment in any direction. The inclusion of these new parameters lets the MLP learn material properties, rather than pure radiance values. This facilitates a more complex rendering pipeline, calculating direct and global illumination, specular highlights, and shadows. As a result, the NeRF can render the scene under any lighting conditions with no re-training. === Plenoctrees === Although NeRFs had reached high levels of fidelity, their costly compute time made them useless for many applications requiring real-time rendering, such as VR/AR and interactive content. Introduced in 2021, Plenoctrees (plenoptic octrees) enabled real-time rendering of pre-trained NeRFs through division of the volumetric radiance function into an octree. Rather than assigning a radiance direction into the camera, viewing direction is taken out of the network input and spherical radiance is predicted for each region. This makes rendering over 3000x faster than conventional NeRFs. === Sparse Neural Radiance Grid === Similar to Plenoctrees, this method enabled real-time rendering of pretrained NeRFs. To avoid querying the large MLP for each point, this method bakes NeRFs into Sparse Neural Radiance Grids (SNeRG). A SNeRG is a sparse voxel grid containing opacity and color, with learned feature vectors to encode view-dependent information. A lightweight, more efficient MLP is then used to produce view-dependent residuals to modify the color and opacity. To enable this compressive baking, small changes to the NeRF architecture were made, such as running the MLP once per pixel rather than for each point along the ray. These improvements make SNeRG extremely efficient, outperforming Plenoctrees. === Instant NeRFs === In 2022, researchers at Nvidia enabled real-time training of NeRFs through a technique known as Instant Neural Graphics Primitives. An innovative input encoding reduces computation, enabling real-time training of a NeRF, an improvement orders of magnitude above previous methods. The speedup stems from the use of spatial hash functions, which have O ( 1 ) {\displaystyle O(1)} access times, and parallelized architectures which run fast on modern GPUs. == Related techniques == === Plenoxels === Plen

    Read more →
  • Text normalization

    Text normalization

    Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure. == Applications == Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context. For example: "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan. "vi" could be pronounced as "vie," "vee," or "the sixth" depending on the surrounding words. Text can also be normalized for storing and searching in a database. For instance, if a search for "resume" is to match the word "résumé," then the text would be normalized by removing diacritical marks; and if "john" is to match "John", the text would be converted to a single case. To prepare text for searching, it might also be stemmed (e.g. converting "flew" and "flying" both into "fly"), canonicalized (e.g. consistently using American or British English spelling), or have stop words removed. == Techniques == For simple, context-independent normalization, such as removing non-alphanumeric characters or diacritical marks, regular expressions would suffice. For example, the sed script sed ‑e "s/\s+/ /g" inputfile would normalize runs of whitespace characters into a single space. More complex normalization requires correspondingly complicated algorithms, including domain knowledge of the language and vocabulary being normalized. Among other approaches, text normalization has been modeled as a problem of tokenizing and tagging streams of text and as a special case of machine translation. == Textual scholarship == In the field of textual scholarship and the editing of historic texts, the term "normalization" implies a degree of modernization and standardization – for example in the extension of scribal abbreviations and the transliteration of the archaic glyphs typically found in manuscript and early printed sources. A normalized edition is therefore distinguished from a diplomatic edition (or semi-diplomatic edition), in which some attempt is made to preserve these features. The aim is to strike an appropriate balance between, on the one hand, rigorous fidelity to the source text (including, for example, the preservation of enigmatic and ambiguous elements); and, on the other, producing a new text that will be comprehensible and accessible to the modern reader. The extent of normalization is therefore at the discretion of the editor, and will vary. Some editors, for example, choose to modernize archaic spellings and punctuation, but others do not. An edition of a text might be normalized based on internal criteria, where orthography is standardized according to the language of the original, or external criteria, where the norms of a different time period are applied. For an example of the latter, a published edition of a medieval Icelandic manuscript might be normalized to the conventions of modern Icelandic, or it might be normalized to Classical Old Icelandic. Standards of normalization vary based on language of the edition as well as the specific conventions of the publisher.

    Read more →
  • Microsoft Teams

    Microsoft Teams

    Microsoft Teams is a team collaboration platform developed by Microsoft as part of the Microsoft 365 suite. It offers features such as workspace chat, video conferencing, file storage, and integration with both Microsoft and third-party applications and services. Teams gradually replaced earlier Microsoft messaging and collaboration platforms, including Skype for Business, Skype, Flip, and Microsoft Classroom. The platform saw significant growth during the COVID-19 pandemic, alongside competitors such as Zoom, Slack, and Google Meet, as organizations shifted to remote work and virtual meetings. As of January 2023, Microsoft reported approximately 280 million monthly active users. == History == On August 29, 2007, Microsoft acquired Parlano, the developer of the persistent group chat tool MindAlign. Years later, on March 4, 2016, Microsoft considered acquiring Slack for $8 billion. However, the proposal was reportedly opposed by Bill Gates, who advocated for focusing on enhancing Skype for Business instead. Lu Qi, then executive vice president of Applications and Services, had led the initiative to pursue the Slack acquisition. Following Lu's departure later that year, Microsoft announced Microsoft Teams on November 2, 2016, at an event in New York City, positioning it as a direct competitor to Slack. Teams launched worldwide on March 14, 2017. The service was initially led by corporate vice president Brian MacDonald. In response to the launch, Slack published a full-page advertisement in The New York Times welcoming the competition and outlining its product philosophy. Although Slack was used by 28 companies in the Fortune 100, The Verge wrote that executives would question paying for the service if Teams provides a similar function in their company's existing Office 365 subscription. However, ZDNET noted that the platforms initially served different markets, as Teams did not support external users, making it less appealing to small businesses and freelancers, a limitation Microsoft later addressed. In response to Teams' announcement, Slack deepened in-product integration with Google services. In May 2017, Microsoft announced that Teams would replace Microsoft Classroom in Office 365 Education. A free version of Teams was released on July 12, 2018, offering most core features at no cost, albeit with limits on users and storage. In January 2019, Microsoft introduced updates targeting "Firstline Workers" to improve Teams’ performance across shared or limited-access devices. In September 2019, Microsoft announced the retirement of Skype for Business in favor of Teams, which took effect on July 31, 2021. In early 2020, Microsoft introduced a push-to-talk "Walkie Talkie" feature aimed at firstline workers using smartphones and tablets over Wi-Fi or cellular networks. The COVID-19 pandemic significantly boosted usage of Teams. On March 19, 2020, Microsoft reported 44 million daily active users. In April, the platform logged 4.1 billion meeting minutes in a single day. A public preview of Microsoft Teams for Linux was released in December 2019, but the Linux client was discontinued in 2022. In July 2020, Microsoft shut down its video game livestreaming platform Mixer, and announced that some of its technologies would be repurposed for use in Teams. On February 28, 2025, Microsoft announced that Skype would be fully retired on May 5, 2025, with users given options to export their data or transition to Microsoft Teams. In October 2025, together with other Microsoft 365 suite apps, Teams had its logo updated. == Usage == == Underlying software == Microsoft Teams, as part of the Microsoft 365 suite, utilizes SharePoint and Exchange Online. Each Team, Shared Channel, and Private Channel has its own Microsoft 365 Group and SharePoint Site used for file storage. Messages are stored in Cosmos DB and are journaled to Exchange Online mailboxes. Private messages, including messages in Private Channels, are journaled to the sender and recipients' mailboxes. Public Channel messages are journaled to their corresponding Team's group mailbox, whereas, messages from Shared Channels are journaled to their own mailboxes. Contacts and voicemail are stored in Exchange Online. Microsoft Teams client is a web-based desktop app, originally developed on top of the Electron framework which combines the Chromium rendering engine and the Node.js JavaScript platform. Version 2.0 client was rebuilt using the Evergreen version of Microsoft Edge WebView2 in place of Electron. == Features == === Chats === Teams allows users to communicate in two-way persistent chats with one or multiple participants. Participants can message using text, emojis, stickers and gifs, as well as sharing links and files. In August 2022, the chat feature was updated for "chat with yourself"; allowing for the organization of files, notes, comments, images, and videos within a private chat tab. === Teams === Teams allows communities, groups, or teams to contribute in a shared workspace where messages and digital content on a specific topic are shared. Team members can join through an invitation sent by a team administrator or owner or sharing of a specific URL. Teams for Education allows admins and teachers to set up groups for classes, professional learning communities (PLCs), staff members, and everyone. === Channels === Channels allow team members to communicate without the use of email or group SMS (texting). Users can reply to posts with text, images, GIFs, and image macros. Direct messages send private messages to designated users rather than the entire channel. Connectors can be used within a channel to submit information contacted through a third-party service. Connectors include Mailchimp, Facebook Pages, Twitter, Power BI and Bing News. === Group conversations === Ad-hoc groups can be created to share instant messaging, audio calls (VoIP), and video calls inside the client software. === Telephone replacement === A feature on one of the higher cost licencing tiers allows connectivity to the public switched telephone network (PSTN) telephone system. This allows users to use Teams as if it were a telephone, making and receiving calls over the PSTN, including the ability to host "conference calls" with multiple participants. === Meeting === Meetings can be scheduled with multiple participants able to share audio, video, chat and presented content with all participants. Multiple users can connect via a meeting link. Automated minutes are possible using the recording and transcript features. Teams has a plugin for Microsoft Outlook to schedule a Teams Meeting in Outlook for a specific date and time and invite others to attend. If a meeting is scheduled within a channel, users visiting the channel are able to see if a meeting is in progress. ==== Teams Live Events ==== Teams Live Events replaces Skype Meeting Broadcast for users to broadcast to 10,000 participants on Teams, Yammer, or Microsoft Stream. ==== Breakout Rooms ==== Breakout rooms split a meeting into small groups. This is often utilized for collaboration during trainings or any environment where having all participants speak at once could be disruptive or unfeasible. Breakout rooms can be set by the hosts to a certain length of time, after which all participants will automatically rejoin the main meeting room. ==== Front Row ==== Front Row adjusts the layout of the viewer's screen, placing the speaker or content in the center of the gallery with other meeting participant's video feeds reduced in size and located below the speaker. === Education === Microsoft Teams for Education allows teachers to distribute, provide feedback, and grade student assignments turned in via Teams using the Assignments tab through Office 365 for Education subscribers. Quizzes can also be assigned to students through an integration with Office Forms. === Protocols === Microsoft Teams is based on a number of Microsoft-specific protocols. Video conferences are realized over the protocol MNP24, known from the Skype consumer version. VoIP and video conference clients based on SIP and H.323 need special gateways to connect to Microsoft Teams servers. With the help of Interactive Connectivity Establishment (ICE), clients behind Network address translation routers and restrictive firewalls are also able to connect, if peer-to-peer is not possible. === Integrations === Microsoft Teams has integrations through Microsoft AppSource, its integration marketplace. In 2020, Microsoft partnered with KUDO, a cloud-based solution with language interpretation, to allow integrated language meeting controls. In June 2022, an update was released using AI to improve call audio through the elimination of background feedback loops and cancelling non-vocal audio. == Anti-trust controversy == In July 2023, the European Commission opened an anti-trust investigation into the possibility that Microsoft unfairly used its office suite market power to increase sales of Teams and hurt

    Read more →
  • Lessac Technologies

    Lessac Technologies

    Lessac Technologies, Inc. (LTI) is an American firm which develops voice synthesis software, licenses technology and sells synthesized novels as MP3 files. The firm currently has seven patents granted and three more pending for its automated methods of converting digital text into human-sounding speech, more accurately recognizing human speech and outputting the text representing the words and phrases of said speech, along with recognizing the speaker's emotional state. The LTI technology is partly based on the work of the late Arthur Lessac, a Professor of Theater at the State University of New York and the creator of Lessac Kinesensic Training, and LTI has licensed exclusive rights to exploit Arthur Lessac's copyrighted works in the fields of speech synthesis and speech recognition. Based on the view that music is speech and speech is music, Lessac's work and books focused on body and speech energies and how they go together. Arthur Lessac's textual annotation system, which was originally developed to assist actors, singers, and orators in marking up scripts to prepare for performance, is adapted in LTI's speech synthesis system as the basic representation of the speech to be synthesized (Lessemes), in contrast to many other systems which use a phonetic representation. LTI's software has two major components: (1) a linguistic front-end that converts plain text to a sequence of prosodic and phonosensory graphic symbols (Lessemes) based on Arthur Lessac's annotation system, which specify the speech units to be synthesized; (2) a signal-processing back-end that takes the Lessemes as acoustic data and produces human-sounding synthesized speech as output, using unit selection and concatenation. LTI's text-to-speech system came in second in the world-wide Blizzard Challenge 2011 and 2012. The first-place team in 2011 also employed LTI's "front-end" technology, but with its own back-end. The Blizzard Challenge, conducted by the Language Technologies Institute of Carnegie Mellon University, was devised as a way to evaluate speech synthesis techniques by having different research groups build voices from the same voice-actor recordings, and comparing the results through listening tests. LTI was founded in 2000 by H. Donald Wilson (chairman), a lawyer, LexisNexis entrepreneur and business associate of Arthur Lessac; and Gary A. Marple (chief inventor), after Marple suggested that Arthur Lessac's kinesensic voice training might be applicable to computational linguistics. After Wilson's death in 2006, his nephew John Reichenbach became the firm's CEO.

    Read more →
  • Jaggaer

    Jaggaer

    JAGGAER, formerly SciQuest, is a provider of cloud-based business automation technology for Business Spend Management. Its headquarters is in Durham, North Carolina. == Company history == SciQuest was established in 1995 as a B2B eCommerce exchange.The company went public with an IPO in 1999. In 2001, SciQuest transitioned from a B2B exchange company into eProcurement software and supplier enablement platforms. SciQuest was taken private in 2004 and continued to move into eProcurement, inventory management and accounts payable automation. SciQuest completed an IPO in September 2010, raising approximately $57 million. SciQuest, and its 510 person workforce, was taken private in June 2016 as part of a $509 million acquisition by Accel-KKR, a private equity firm headquartered in Menlo Park, CA. In 2017 SciQuest was rebranded as JAGGAER and announced increased focus on offering a complete, integrated source-to-pay suite. Along with the name change, the company expanded its market focus to manufacturing, healthcare, consumer packaged goods, retail, education, life sciences, logistics and the public sector. JAGGAER acquired the European direct materials procurement specialist Pool4Tool in June 2017 giving it end-to-end direct as well as indirect materials procurement coverage. JAGGAER acquired spend management company BravoSolution in 2017, and entered into a joint venture with United Arab Emirates-based Tejari. In February 2019 JAGGAER launched JAGGAER One, which unifies its full product suite on a single platform. In 2019 the UK-based private equity firm Cinven acquired a majority holding in the company. Jim Bureau was subsequently named JAGGAER's Chief Executive Officer. Bureau left the firm in March 2023, and Andy Hovancik was announced as the company's CEO in June. In 2024, JAGGAER was acquired by Vista Equity Partners, a private equity firm specializing in enterprise software investments. == Current positioning == As of April 2025, JAGGAER positions itself as "an enterprise procurement and supplier collaboration SaaS provider." Its core technology platform, which is called JAGGAER One, serves "direct and indirect procurement with specializations in Higher Education, Discrete and Process Manufacturing, and Public Sector." == Product Categories == The JAGGAER One platform supports the following products: Spend Analytics Category Management Supplier Management Sourcing Contracts eProcurement Invoicing Inventory Management Supply Chain Collaboration Quality Management == Acquisitions == SciQuest acquired the following companies: AECsoft - January 2011. Provider of supplier management and sourcing technology. Upside Software, Inc. - August 2012. Provider of contract lifecycle management (CLM) solutions. Spend Radar, LLC - October 2012, Provider of spend analysis software. CombineNet - September 2013, Provider of advanced sourcing software JAGGAER acquired the following companies: POOL4TOOL - June 2017, Provider of direct sourcing and supply chain management software BravoSolution - December 2017, Provider of global platform spend management solutions

    Read more →
  • Image formation

    Image formation

    The study of image formation encompasses the radiometric and geometric processes by which 2D images of 3D objects are formed. In the case of digital images, the image formation process also includes analog to digital conversion and sampling. == Imaging == The imaging process is a mapping of an object to an image plane. Each point on the image corresponds to a point on the object. An illuminated object will scatter light toward a lens and the lens will collect and focus the light to create the image. The ratio of the height of the image to the height of the object is the magnification. The spatial extent of the image surface and the focal length of the lens determines the field of view of the lens. Image formation of mirror these have a center of curvature and its focal length of the mirror is half of the center of curvature. == Illumination == An object may be illuminated by the light from an emitting source such as the sun, a light bulb or a Light Emitting Diode. The light incident on the object is reflected in a manner dependent on the surface properties of the object. For rough surfaces, the reflected light is scattered in a manner described by the Bi-directional Reflectance Distribution Function (BRDF) of the surface. The BRDF of a surface is the ratio of the exiting power per square meter per steradian (radiance) to the incident power per square meter (irradiance). The BRDF typically varies with angle and may vary with wavelength, but a specific important case is a surface that has constant BRDF. This surface type is referred to as Lambertian and the magnitude of the BRDF is R/π, where R is the reflectivity of the surface. The portion of scattered light that propagates toward the lens is collected by the entrance pupil of the imaging lens over the field of view. == Field of view and imagery == The Field of view of a lens is limited by the size of the image plane and the focal length of the lens. The relationship between a location on the image and a location on the object is y = ftan(θ), where y is the max extent of the image plane, f is the focal length of the lens and θ is the field of view. If y is the max radial size of the image then θ is the field of view of the lens. While the image created by a lens is continuous, it can be modeled as a set of discrete field points, each representing a point on the object. The quality of the image is limited by the aberrations in the lens and the diffraction created by the finite aperture stop. == Pupils and stops == The aperture stop of a lens is a mechanical aperture which limits the light collection for each field point. The entrance pupil is the image of the aperture stop created by the optical elements on the object side of the lens. The light scattered by an object is collected by the entrance pupil and focused onto the image plane via a series of refractive elements. The cone of the focused light at the image plane is set by the size of the entrance pupil and the focal length of the lens. This is often referred to as the f-stop or f-number of the lens. f/# = f/D where D is the diameter of the entrance pupil. == Pixelation and color vs. monochrome == In typical digital imaging systems, a sensor is placed at the image plane. The light is focused on to the sensor and the continuous image is pixelated. The light incident on each pixel in the sensor will be integrated within the pixel and a proportional electronic signal will be generated. The angular geometric resolution of a pixel is given by atan(p/f), where p is the pitch of the pixel. This is also called the pixel field of view. The sensor may be monochrome or color. In the case of a monochrome sensor, the light incident on each pixel is integrated and the resulting image is a grayscale like picture. For color images, a mosaic color filter is typically placed over the pixels to create a color image. An example is a Bayer filter. The signal incident on each pixel is then digitized to a bit stream. == Image quality == The quality of an image is dependent upon both geometric and physical items. Geometrically, higher density of pixels across an image will give less blocky pixelation and thus a better geometric image quality. Lens aberrations also contribute to the quality of the image. Physically, diffraction due to the aperture stop will limit the resolvable spatial frequencies as a function of f-number. In the frequency domain, Modulation Transfer Function (MTF) is a measure of the quality of the imaging system. The MTF is a measure of the visibility of a sinusoidal variation in irradiance on the image plane as a function of the frequency of the sinusoid. It includes the effects of diffraction, aberrations and pixelation. For the lens, the MTF is the autocorrelation of the pupil function, so it accounts for the finite pupil extent and the lens aberrations. The sensor MTF is the Fourier Transform of the pixel geometry. For a square pixel, MTF(ξ) = sin(πξp)/πξp where p is the pixel width and ξ is the spatial frequency. The MTF of the combination of the lens and detector is the product of the two component MTFs. == Perception == Color images can be perceived via two means. In the case of computer vision the light incident on the sensor comprises the image. In the case of visual perception, the human eye has a color dependent response to light so this must be accounted for. This is important consideration when converting to grayscale. == Image formation in eye == The principal difference between the lens of the eye and an ordinary optical lens is that the former is flexible. The radius of the curvature of the anterior surface of the lens is greater than the radius of its posterior surface. The shape of the lens is controlled by tension in the fibers of the ciliary body. To focus on distant objects, the controlling muscles cause the lens to be relatively flattened. Similarly, these muscles allow the lens to become thicker in order to focus on objects near the eye. The distance between the center of the lens and the retina (focal length) varies from approximately 17 mm to about 14 mm, as the refractive power of the lens increases from its minimum to its maximum. When the eye focuses on an object farther away than about 3 m, the lens exhibits its lowest refractive power. When the eye focuses on a close object, the lens is most strongly refractive.

    Read more →
  • Region Based Convolutional Neural Networks

    Region Based Convolutional Neural Networks

    Region-based Convolutional Neural Networks (R-CNN) are a family of machine learning models for computer vision, and specifically object detection and localization. The original goal of R-CNN was to take an input image and produce a set of bounding boxes as output, where each bounding box contains an object and also the category (e.g. car or pedestrian) of the object. In general, R-CNN architectures perform selective search over feature maps outputted by a CNN. R-CNN has been extended to perform other computer vision tasks, such as: tracking objects from a drone-mounted camera, locating text in an image, and enabling object detection in Google Lens. Mask R-CNN is also one of seven tasks in the MLPerf Training Benchmark, which is a competition to speed up the training of neural networks. == History == The following covers some of the versions of R-CNN that have been developed. November 2013: R-CNN. April 2015: Fast R-CNN. June 2015: Faster R-CNN. March 2017: Mask R-CNN. December 2017: Cascade R-CNN is trained with increasing Intersection over Union (IoU, also known as the Jaccard index) thresholds, making each stage more selective against nearby false positives. June 2019: Mesh R-CNN adds the ability to generate a 3D mesh from a 2D image. == Architecture == For review articles see. === Selective search === Given an image (or an image-like feature map), selective search (also called Hierarchical Grouping) first segments the image by the algorithm in (Felzenszwalb and Huttenlocher, 2004), then performs the following: Input: (colour) image Output: Set of object location hypotheses L Segment image into initial regions R = {r1, ..., rn} using Felzenszwalb and Huttenlocher (2004) Initialise similarity set S = ∅ foreach Neighbouring region pair (ri, rj) do Calculate similarity s(ri, rj) S = S ∪ s(ri, rj) while S ≠ ∅ do Get highest similarity s(ri, rj) = max(S) Merge corresponding regions rt = ri ∪ rj Remove similarities regarding ri: S = S \ s(ri, r∗) Remove similarities regarding rj: S = S \ s(r∗, rj) Calculate similarity set St between rt and its neighbours S = S ∪ St R = R ∪ rt Extract object location boxes L from all regions in R === R-CNN === With R-CNN, prediction follows a two-step process. A preprocessing selective search step generates a large set of candidate objects (typically as many as 2000), known as regions of interest (ROI). These are forwarded to a CNN, which predicts an object class score and bounding box estimate, independently for each ROI. Importantly, the ROIs are heavily filtered to remove excess candidates. This is achieved using two mechanism. Filtering begins by removing ROIs assigned to the background category. This is a specialized category, which is scored by the CNN alongside other categories. An unfortunate reality is that remaining ROIs typically suffer from heavy duplication. Namely, multiple ROIs that cover same objects in the image are all assigned non-background categories. This is resolved by a heuristic non-maximum suppression (NMS) step. === Fast R-CNN === While the original R-CNN independently computed the neural network features on each of as many as two thousand regions of interest, Fast R-CNN runs the neural network once on the whole image. At the end of the network is a ROIPooling module, which slices out each ROI from the network's output tensor, reshapes it, and classifies it. As in the original R-CNN, the Fast R-CNN uses selective search to generate its region proposals. === Faster R-CNN === While Fast R-CNN used selective search to generate ROIs, Faster R-CNN integrates the ROI generation into the neural network itself. === Mask R-CNN === While previous versions of R-CNN focused on object detections, Mask R-CNN adds instance segmentation. Mask R-CNN also replaced ROIPooling with a new method called ROIAlign, which can represent fractions of a pixel.

    Read more →
  • Compute (machine learning)

    Compute (machine learning)

    In machine learning and deep learning, compute is the amount of computing power or computational resources required to train machine learning models and large language models. More broadly, compute is the computational power or resources necessary for a computer or computer program to function. == Definition == Compute is commonly defined as the amount of computing power or computational resources required to train machine learning and large language models. The term "compute" has also been more broadly applied to cloud computing, referencing processing power, memory, networking, storage, and other resources required for the computation of any program. Compute is measured in petaflop/s-days and is used to document AI training. A petaflop/s-day (pfs-day) consists of performing 1015 neural net operations per second for one day, or a total of about 1020 operations. The compute-time product serves as a mental convenience, similar to kilowatt-hour for energy. An amount of compute is meant to give an idea of the number of actual operations performed. == History == In a 2018 analysis titled "AI and compute", artificial intelligence company OpenAI introduced the concept of compute. OpenAI identified two eras of training AI systems in terms of compute-usage. From 1959 to 2012, compute roughly followed Moore’s law. Between 2012 and 2018, the amount of compute used in the largest AI training runs increased exponentially, growing by more than 300,000 times — roughly doubling every 3.4 months. By comparison, Moore’s Law doubled every two years over the same period. One of the largest models, released in 2020, used 600,000 times more computing power than the 2012 model. After 2020, compute growth began to slow down, with the compute needed for the largest AI models continuing to slow down in 2023. The notion of compute has become increasingly used from the mid-2020s onwards. == Compute growth and AI progress == Larger AI models trained on more data and using more computational resources, tend to perform better. This happens even if the algorithms themselves remain unchanged. As early as 2018, OpenAI noted the exponential increase in compute to be have a key role in AI progress. OpenAI considers three factors drive the advance of AI: algorithmic innovation, data, and the amount of compute available for training. AI models with more compute not only improve in the tasks they were trained on but can develop emergent abilities. Incremental improvements can lead to more abrupt leaps in capabilities. AI provider SpaceXAI said in 2026 that their AI progress is driven by compute and used it a key metric in the AI training of its supercomputer Colossus, the which contains 1 million GPUs. Anthropic has a contract of $1.25 billion per month with SpaceXAI to buy all the compute capacity at Colossus 1 data center. === Criticism and policy === Increasing, promoting or constraining progress in artificial intelligence has often be done via controlling the amount of compute. Policymarkers have enacted policies and provided support to make compute resources more accessible to domestic AI researchers. In a January 2022 report, the Center for Security and Emerging Technology (CSET) suggested to institutions that increasingly powerful and generalizable AI (AGI) will likely require other strategies than maximizing compute. Some AI researchers are also concerned that government might exclusively focus on scaling compute instead of other strategies. The CSET has reported on the various bottlenecks which could explain why deep learning needs for compute have slow down: training is expensive and training extremely large models generates traffic jams across many processors that are difficult to manage. there is a limited supply of AI chips (see AI chip memory shortage). CSET advances that the main resource is human capital, specifically talented researchers — according to a 2023 published survey of more than 400 AI researchers, academic and private sector workers. The survey found that AI researchers are not primarily or exclusively constrained by compute access. However, both academic and industry AI researchers equally report concerns that insufficient compute could prevent them from contributing meaningfully to AI research in the future. High compute users are more concerned about compute access. When asked about which resource provided by the government would be the most useful to them, some AI researchers select compute, other prefer grant funding. For this goal, CSET advised policymakers to ensure that even researchers with smaller budgets could effectively contribute to AI research. Other proposed strategies include using contemporary AI algorithms, managing modern AI infrastructure or focusing on interdisciplinary work between the AI field and other fields of computer science. A 2024 study on compute access found that academic-only AI research teams often have less compute intensive research topics, especially foundation models, compared to industry AI labs. As a consequence, academia is likely to play a smaller role in advancing such techniques. The researchers suggest nationally-sponsored computing infrastructure as well as open science initiatives to boost academic compute access. === Data === A 2022 study found that current large language models are significantly under-trained, a consequence of focusing on scaling language models whilst keeping the amount of training data constant. By training over 400 language models of various parameter and token size, they found that "for compute-optimal training", the model size and the number of training tokens should ideally be scaled equally: for every doubling of model size the number of training tokens should also be doubled.

    Read more →
  • ReRites

    ReRites

    ReRites (also known as RERITES, ReadingRites, Big Data Poetry) is a literary work of "Human + A.I. poetry" by David Jhave Johnston that used neural network models trained to generate poetry which the author then edited. ReRites won the Robert Coover Award for a Work of Electronic Literature in 2022. == About the project == The ReRites project began as a daily rite of writing with a neural network, expanded into a series of performances from which video documentation has been published online, and concluded with a set of 12 books and an accompanying book of essays published by Anteism Books in 2019. In Electronic Literature, Scott Rettberg describes the early phases of the project in 2016, when it bore the preliminary name Big Data Poetry. Jhave (the artist name that David Jhave Johnston goes by) describes the process of writing ReRites as a rite: "Every morning for 2 hours (normally 6:30–8:30am) I get up and edit the poetic output of a neural net. Deleting, weaving, conjugating, lineating, cohering. Re-writing. Re-wiring authorship: hybrid augmented enhanced evolutionary". There is video documentation of the writing process. The human editing of the neural network's output is fundamental to this project, and Jhave gives examples of both unedited text extracts and his edited versions in publications about the project. Kyle Booten describes ReRites as "simultaneously dusty and outrageously verdant, monotonously sublime and speckled with beautiful and rare specimens". === Performances === ReRites was first shared with an audience through a series of performances where audience members and poets would participate in reading the automatically generated texts, which appeared on screen so fast that human readers could barely keep up. This has been described as allowing participants to "re-discover[..] the peculiar pleasures of being embodied", or, in Jhave's own words, as a space where human participants were "playing their wits and voices against an evocative infinite deep-learning muse". The first performance was at Brown University's Interrupt Festival in 2019. It has been performed many times since, including at the Barbican Centre in London and Anteism Books. === Print publications === For a single year Jhave published one book of poetry from the ReRites project each month. These twelve volumes are accompanied by a book of essays, all published by Anteism Books. The accompanying essays provide critical responses to the project from poets and scholars including Allison Parrish, Johanna Drucker, Kyle Booten, Stephanie Strickland, John Cayley, Lai-Tze Fan, Nick Montfort, Mairéad Byrne, and Chris Funkhouser. Allison Parrish notes elsewhere that these paratexts to ReRites serve a legitimising function for a genre of poetry that is not yet institutionally acknowledged. === Technical details === Starting in 2016 under the name Big Data Poetry, Jhave generated poems using, in his own words, "neural network code (..) adapted from three corporate github-hosted machine-learning libraries: TensorFlow (Google), PyTorch (Facebook), and AWD-LSTM (SalesForce)". He explains that the "models were trained on a customised corpus of 600,000 lines of poetry ranging from the romantic epoch to the 20th century avant garde". Jhave maintains a GitHub repository with some of the code supporting ReRites. == Reception == ReRites is described by John Cayley as "one of the most thorough and beautiful" poetic responses to machine learning. The work's influence on the field of electronic literature was acknowledged in 2022, when the work won the Electronic Literature Organization's Robert Coover Award for a Work of Electronic Literature. The jury described ReRites as particularly poignant in the time of the pandemic, as it was "a documentation of the performance of the private ritual of writing and the obsessive-compulsive need for writers to communicate — even when no one else is reading". The question of authorship and voice in ReRites has been raised by several critics. Although generated poetry is an established genre in electronic literature, Cayley notes that unlike the combinatory poems created by authors like Nick Montfort, where the author explicitly defines which words and phrases will be recombined, ReRites has "not been directed by literary preconceptions inscribed in the program itself, but only by patterns and rhythms pre-existing in the corpora". In an essay for the Australian journal TEXT, David Thomas Henry Wright asks how to understand authorship and authority in ReRites: "Who or what is the authority of the work? The original data fed into the machine, that is not currently retrievable or discernible from the final works? The code that was taken and adapted for his purposes? Or Jhave, the human editor?" Wright concludes that Jhave is the only actor with any intentionality and therefore the authority of the work. The centrality of the human editor is also emphasised by other scholars. In a chapter analysing ReRites Malthe Stavning Erslev argues that the machine learning misrepresents the dataset it is trained on. While ReRites uses 21st century neural networks, it has been compared to earlier literary traditions. Poet Victoria Stanton, who read at one of the ReRites performances, has compared ReRites to found poetry, while David Thomas Henry Wright compares it to the Oulipo movement and Mark Amerika to the cut-up technique. Scholars also position ReRites firmly within the long tradition of generative poetry both in electronic literature and print, stretching from the I Ching, Queneau's Cent Mille Milliards de Poemes and Nabokov's Pale Fire to computer-generated poems like Christopher Strachey's Love Letter Generator (1952) and more contemporary examples. Jhave describes the process of working with the output from the neural network as "carving". In his book My Life as an Artificial Creative Intelligence, Mark Amerika writes that the "method of carving the digital outputs provided by the language model as part of a collaborative remix jam session with GPT-2, where the language artist and the language model play off each other’s unexpected outputs as if caught in a live postproduction set, is one I share with electronic literature composer David Jhave Johnston, whose AI poetry experiments precede my own investigations."

    Read more →
  • Scene text

    Scene text

    Scene text is text that appears in an image captured by a camera in an outdoor environment. The detection and recognition of scene text from camera captured images are computer vision tasks which became important after smart phones with good cameras became ubiquitous. The text in scene images varies in shape, font, colour and position. The recognition of scene text is further complicated sometimes by non-uniform illumination and focus. To improve scene text recognition, the International Conference on Document Analysis and Recognition (ICDAR) conducts a robust reading competition once in two years. The competition was held in 2003, 2005 and during every ICDAR conference. International association for pattern recognition (IAPR) has created a list of datasets as Reading systems. == Text detection == Text detection is the process of detecting the text present in the image, followed by surrounding it with a rectangular bounding box. Text detection can be carried out using image based techniques or frequency based techniques. In image based techniques, an image is segmented into multiple segments. Each segment is a connected component of pixels with similar characteristics. The statistical features of connected components are utilised to group them and form the text. Machine learning approaches such as support vector machine and convolutional neural networks are used to classify the components into text and non-text. In frequency based techniques, discrete Fourier transform (DFT) or discrete wavelet transform (DWT) are used to extract the high frequency coefficients. It is assumed that the text present in an image has high frequency components and selecting only the high frequency coefficients filters the text from the non-text regions in an image. == Word recognition == In word recognition, the text is assumed to be already detected and located and the rectangular bounding box containing the text is available. The word present in the bounding box needs to be recognized. The methods available to perform word recognition can be broadly classified into top-down and bottom-up approaches. In the top-down approaches, a set of words from a dictionary is used to identify which word suits the given image. Images are not segmented in most of these methods. Hence, the top-down approach is sometimes referred as segmentation free recognition. In the bottom-up approaches, the image is segmented into multiple components and the segmented image is passed through a recognition engine. Either an off the shelf Optical character recognition (OCR) engine or a custom-trained one is used to recognise the text.

    Read more →