AI Chatbot Soulmate

AI Chatbot Soulmate — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Videotex

    Videotex

    Videotex (or interactive videotex) was one of the earliest implementations of an end-user information system. From the late 1970s to early 2010s, it was used to deliver information (usually pages of text) to a user in computer-like format, typically to be displayed on a television or a dumb terminal. In a strict definition, videotex is any system that provides interactive content and displays it on a video monitor such as a television, typically using modems to send data in both directions. A close relative is teletext, which sends data in one direction only, typically encoded in a television signal. All such systems are occasionally referred to as viewdata. Unlike the modern Internet, traditional videotex services were highly centralized. Videotex in its broader definition can be used to refer to any such service, including teletext, the Internet, bulletin board systems, online service providers, and even the arrival/departure displays at an airport. This usage is no longer common. With the exception of Minitel in France, videotex elsewhere never managed to attract any more than a very small percentage of the universal mass market once envisaged. By the end of the 1980s its use was essentially limited to a few niche applications. == Initial development and technologies == === United Kingdom === The first attempts at a general-purpose videotex service were created in the United Kingdom in the late 1960s. In about 1970 the BBC had a brainstorming session in which it was decided to start researching ways to send closed captioning information to the audience. As the Teledata research continued the BBC became interested in using the system for delivering any sort of information, not just closed captioning. In 1972, the concept was first made public under the new name Ceefax. Meanwhile, the General Post Office (soon to become British Telecom) had been researching a similar concept since the late 1960s, known as Viewdata. Unlike Ceefax which was a one-way service carried in the existing TV signal, Viewdata was a two-way system using telephones. Since the Post Office owned the telephones, this was considered to be an excellent way to drive more customers to use the phones. Not to be outdone by the BBC, they also announced their service, under the name Prestel. ITV soon joined the fray with a Ceefax-clone known as ORACLE. In 1974, all the services agreed on a standard for displaying the information. The display would be a simple 40×24 grid of text, with some "graphics characters" for constructing simple graphics, revised and finalized in 1976. The standard did not define the delivery system, so both Viewdata-like and Teledata-like services could at least share the TV-side hardware, which was expensive at the time. The standard also introduced a new term that covered all such services, teletext. Ceefax first started operation in 1974 with a limited 30 pages, followed quickly by ORACLE and then Prestel in 1979. By 1981, Prestel International was available in nine countries, and a number of countries, including Sweden, The Netherlands, Finland and West Germany were developing their own national systems closely based on Prestel. General Telephone and Electronics (GTE) acquired an exclusive agency for the system for North America. In the early 1980s, videotex became the base technology for the London Stock Exchange's pricing service called TOPIC. Later versions of TOPIC, notably TOPIC2 and TOPIC3, were developed by Thanos Vassilakis and introduced trading and historic price feeds. === France === Development of a French teletext-like system began in 1973. A very simple 2-way videotex system called Tictac was also demonstrated in the mid-1970s. As in the UK, this led on to work to develop a common display standard for videotex and teletext, called Antiope, which was finalised in 1977. Antiope had similar capabilities to the UK system for displaying alphanumeric text and chunky "mosaic" character-based block graphics. A difference however was that while in the UK standard control codes automatically also occupied one character position on screen, Antiope allowed for "non spacing" control codes. This gave Antiope slightly more flexibility in the use of colours in mosaic block graphics, and in presenting the accents and diacritics of the French language. Meanwhile, spurred on by the 1978 Nora/Minc report, the French government was determined to catch up on a perceived falling behind in its computer and communications facilities. In 1980 it began field trials issuing Antiope-based terminals for free to over 250,000 telephone subscribers in Ille-et-Vilaine region, where the French CCETT research centre was based, for use as telephone directories. The trial was a success, and in 1982 Minitel was rolled out nationwide. === Canada === Since 1970, researchers at the Communications Research Centre (CRC) in Ottawa had been working on a set of "picture description instructions", which encoded graphics commands as a text stream. Graphics were encoded as a series of instructions (graphics primitives) each represented by a single ASCII character. Graphic coordinates were encoded in multiple 6 bit strings of XY coordinate data, flagged to place them in the printable ASCII range so that they could be transmitted with conventional text transmission techniques. ASCII SI/SO characters were used to differentiate the text from graphic portions of a transmitted "page". In 1975, the CRC gave a contract to Norpak to develop an interactive graphics terminal that could decode the instructions and display them on a colour display, which was successfully up and running by 1977. Against the background of the developments in Europe, CRC was able to persuade the Canadian government to develop the system into a fully-fledged service. In August 1978, the Canadian Department of Communications publicly launched it as Telidon, a "second generation" videotex/teletext service, and committed to a four-year development plan to encourage rollout. Compared to the European systems, Telidon offered real graphics, as opposed to block-mosaic character graphics. The downside was that it required much more advanced decoders, typically featuring Zilog Z80 or Motorola 6809 processors. === Japan === Research in Japan was shaped by the demands of the large number of Kanji characters used in Japanese script. With 1970s technology, the ability to generate so many characters on demand in the end-user's terminal was seen as prohibitive. Instead, development focussed on methods to send pages to user terminals pre-rendered, using coding strategies similar to facsimile machines. This led to a videotex system called Captain ("Character and Pattern Telephone Access Information Network"), created by NTT in 1978, which went into full trials from 1979 to 1981. The system also lent itself naturally to photographic images, albeit at only moderate resolution. However, the pages typically took two or three times longer to load, compared to the European systems. NHK developed an experimental teletext system along similar lines, called CIBS ("Character Information Broadcasting Station"). Based on a 388×200 pixel resolution, it was first announced in 1976, and began trials in late 1978. (NHK's ultimate production teletext system launched in 1983). == Standards == Work to establish an international standard for videotex began in 1978 in CCITT. But the national delegations showed little interest in compromise, each hoping that their system would come to define what was perceived to be going to be an enormous new mass-market. In 1980 CCITT therefore issued recommendation S.100 (later T.100), noting the points of similarity but the essential incompatibility of the systems, and declaring all four to be recognised options. Trying to kick-start the market, AT&T Corporation entered the fray, and in May 1981 announced its own Presentation Layer Protocol (PLP). This was closely based on the Canadian Telidon system, but added to it some further graphics primitives and a syntax for defining macros, algorithms to define cleaner pixel spacing for the (arbitrarily sizeable) text, and also dynamically redefinable characters and a mosaic block graphic character set, so that it could reproduce content from the French Antiope. After some further revisions this was adopted in 1983 as ANSI standard X3.110, more commonly called NAPLPS, the North American Presentation Layer Protocol Syntax. It was also adopted in 1988 as the presentation-layer syntax for NABTS, the North American Broadcast Teletext Specification. Meanwhile, the European national Postal Telephone and Telegraph (PTT) agencies were also increasingly interested in videotex, and had convened discussions in European Conference of Postal and Telecommunications Administrations (CEPT) to co-ordinate developments, which had been diverging along national lines. As well as the British and French standards, the Swedes had proposed extending the British Prestel standard with a new se

    Read more →
  • Retrieval-augmented generation

    Retrieval-augmented generation

    Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this enables LLM-based chatbots to access internal company data or generate responses based on authoritative sources. RAG improves LLMs by incorporating information retrieval before generating responses. Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations, which have caused chatbots to describe policies that don't exist, or recommend nonexistent legal cases to lawyers that are looking for citations to support their arguments. RAG also reduces the need to retrain LLMs with new data, saving on computational and financial costs. Beyond efficiency gains, RAG also allows LLMs to include sources in their responses, so users can verify the cited sources. This provides greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance. The term retrieval-augmented generation (RAG) was introduced in a 2020 paper that described combining a parametric language model with a non-parametric external memory accessed through retrieval at inference time. == RAG and LLM limitations == LLMs can provide incorrect information. For example, when Google first demonstrated its LLM tool "Google Bard" (later re-branded to Gemini), the LLM provided incorrect information about the James Webb Space Telescope. This error contributed to a $100 billion decline in Google's stock value. RAG is used to prevent these errors, but it does not solve all the problems. For example, LLMs can generate misinformation even when pulling from factually correct sources if they misinterpret the context. MIT Technology Review gives the example of an AI-generated response stating, "The United States has had one Muslim president, Barack Hussein Obama." The model retrieved this from an academic book rhetorically titled Barack Hussein Obama: America's First Muslim President? The LLM did not "know" or "understand" the context of the title, generating a false statement. LLMs with RAG are programmed to prioritize new information. This technique has been called "prompt stuffing." Without prompt stuffing, the LLM's input is generated by a user; with prompt stuffing, additional relevant context is added to this input to guide the model's response. This approach provides the LLM with key information early in the prompt, encouraging it to prioritize the supplied data over pre-existing training knowledge. == Process == Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating an information-retrieval mechanism that allows models to access and utilize additional data beyond their original training set. Ars Technica notes that "when new information becomes available, rather than having to retrain the model, all that's needed is to augment the model's external knowledge base with the updated information" ("augmentation"). IBM states that "in the generative phase, the LLM draws from the augmented prompt and its internal representation of its training data to synthesize" an answer. === RAG key stages === Typically, the data to be referenced is converted into LLM embeddings, numerical representations in the form of a large vector space. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs). These embeddings are then stored in a vector database to allow for document retrieval. Given a user query, a document retriever is first called to select the most relevant documents that will be used to augment the query. This comparison can be done using a variety of methods, which depend in part on the type of indexing used. The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query. Newer implementations (as of 2023) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains and using memory and self-improvement to learn from previous retrievals. Finally, the LLM can generate output based on both the query and the retrieved documents. Some models incorporate extra steps to improve output, such as the re-ranking of retrieved information, context selection, and fine-tuning. == Applications == Retrieval-augmented generation is used in applications where generated responses need to be grounded in external or frequently updated information. Commonly cited use cases include search engines, question-answering systems, customer support chatbots, enterprise knowledge assistants, content generation, recommendation systems, retail and e-commerce, and industrial or manufacturing workflows. In healthcare, RAG has been studied as a way to ground large language model outputs in external medical knowledge sources, although reviews have noted continuing challenges around evaluation, ethics, and clinical reliability. == Improvements == Improvements to the basic process above can be applied at different stages in the RAG flow. === Encoder === These methods focus on the encoding of text as either dense or sparse vectors. Sparse vectors, which encode the identity of a word, are typically dictionary-length and contain mostly zeros. Dense vectors, which encode meaning, are more compact and contain fewer zeros. Various enhancements can improve the way similarities are calculated in the vector stores (databases). Performance improves by optimizing how vector similarities are calculated. Dot products enhance similarity scoring, while approximate nearest neighbor (ANN) searches improve retrieval efficiency over K-nearest neighbors (KNN) searches. Accuracy may be improved with Late Interactions, which allow the system to compare words more precisely after retrieval. This helps refine document ranking and improve search relevance. Hybrid vector approaches may be used to combine dense vector representations with sparse one-hot vectors, taking advantage of the computational efficiency of sparse dot products over dense vector operations. Other retrieval techniques focus on improving accuracy by refining how documents are selected. Some retrieval methods combine sparse representations, such as SPLADE, with query expansion strategies to improve search accuracy and recall. === Retriever-centric methods === These methods aim to enhance the quality of document retrieval in vector databases: Pre-training the retriever using the Inverse Cloze Task (ICT), a technique that helps the model learn retrieval patterns by predicting masked text within documents. Supervised retriever optimization aligns retrieval probabilities with the generator model's likelihood distribution. This involves retrieving the top-k vectors for a given prompt, scoring the generated response's perplexity, and minimizing KL divergence between the retriever's selections and the model's likelihoods to refine retrieval. Reranking techniques can refine retriever performance by prioritizing the most relevant retrieved documents during training. === Language model === By redesigning the language model with the retriever in mind, a 25-time smaller network can get comparable perplexity as its much larger counterparts. Because it is trained from scratch, this method (Retro) incurs the high cost of training runs that the original RAG scheme avoided. The hypothesis is that by giving domain knowledge during training, Retro needs less focus on the domain and can devote its smaller weight resources only to language semantics. The redesigned language model is shown here. It has been reported that Retro is not reproducible, so modifications were made to make it so. The more reproducible version is called Retro++ and includes in-context RAG. === Chunking === Chunking involves various strategies for breaking up the data into vectors so the retriever can find details in it. Three types of chunking strategies are: Fixed length with overlap. This is fast and easy. Overlapping consecutive chunks helps to maintain semantic context across chunks. Syntax-based chunks can break the document up into sentences. Libraries such as spaCy or NLTK can also help. File format-based chunking. Certain file types have natural chunks built in, and it's best to respect them. For example, code files are best chunked and vectorized as whole functions or classes. HTML files should leave

    or base64 encoded elements

    Read more →
  • Keith Youngin George II

    Keith Youngin George II

    Keith "Youngin" George II is a former mixtape DJ, music executive, manager, producer, and technology app director. He has collaborated with Maino, T-Pain, Nas and Soulja Boy, among others. He was instrumental in the launch of social media app and website, Kandiid in 2021 and served as Fliiks App Director of Regional Development. == Career == Keith Anthony George II was born in Upper Heyford, Oxfordshire, England. His father was in the Air Force which exposed him to different cultures and music. He graduated from Allen High School and attended San Antonio College. George's music career began in 2006 as a mixtape DJ working as DJ Youngin Beatz. He performed at various shows and worked with a variety of artists, managers, and music executives. In 2007, George released the mixtape, Untapped market Vol. 1 (Da Underdogz), which featured tracks from artists including Kanye West, Lil Wayne, 50 Cent, Yung Berg, and Nelly. In 2008, he began working with Def Jam executive Sarah Alminawi who was managing Maino at the time. George played a key role in the marketing and promotional success of Maino's single, Hi Hater, which peaked at #8 on Billboard's US Bubbling Under Hot 100 chart. In 2021, George was an advisor and infrastructure head at Kandiid, a social media app which won a W3 Award in 2022. In 2023, he became involved with Fliiks App as Director of Regional Development which earned a Telly Award, two Muse Awards, and a W3 Award in 2025. In 2025, George was a composer and producer on two singles on Sekou Andrews's album, Koumami; The Chosen One: ACT 1 (featuring Lion Babe) and Love Don't Care (featuring Jordin Sparks and Omari Hardwick). In 2025, he was awarded an Atlanta City Proclamation for Philanthropy and Community Leadership for his partnership with Women's International Grail, a nonprofit organization that assists women, single mothers, and low-income families. He also collaborates with local youth programs, creative networks, and minority-owned startups, providing access to mentorship and industry knowledge. == Awards ==

    Read more →
  • Microsoft Teams

    Microsoft Teams

    Microsoft Teams is a team collaboration platform developed by Microsoft as part of the Microsoft 365 suite. It offers features such as workspace chat, video conferencing, file storage, and integration with both Microsoft and third-party applications and services. Teams gradually replaced earlier Microsoft messaging and collaboration platforms, including Skype for Business, Skype, Flip, and Microsoft Classroom. The platform saw significant growth during the COVID-19 pandemic, alongside competitors such as Zoom, Slack, and Google Meet, as organizations shifted to remote work and virtual meetings. As of January 2023, Microsoft reported approximately 280 million monthly active users. == History == On August 29, 2007, Microsoft acquired Parlano, the developer of the persistent group chat tool MindAlign. Years later, on March 4, 2016, Microsoft considered acquiring Slack for $8 billion. However, the proposal was reportedly opposed by Bill Gates, who advocated for focusing on enhancing Skype for Business instead. Lu Qi, then executive vice president of Applications and Services, had led the initiative to pursue the Slack acquisition. Following Lu's departure later that year, Microsoft announced Microsoft Teams on November 2, 2016, at an event in New York City, positioning it as a direct competitor to Slack. Teams launched worldwide on March 14, 2017. The service was initially led by corporate vice president Brian MacDonald. In response to the launch, Slack published a full-page advertisement in The New York Times welcoming the competition and outlining its product philosophy. Although Slack was used by 28 companies in the Fortune 100, The Verge wrote that executives would question paying for the service if Teams provides a similar function in their company's existing Office 365 subscription. However, ZDNET noted that the platforms initially served different markets, as Teams did not support external users, making it less appealing to small businesses and freelancers, a limitation Microsoft later addressed. In response to Teams' announcement, Slack deepened in-product integration with Google services. In May 2017, Microsoft announced that Teams would replace Microsoft Classroom in Office 365 Education. A free version of Teams was released on July 12, 2018, offering most core features at no cost, albeit with limits on users and storage. In January 2019, Microsoft introduced updates targeting "Firstline Workers" to improve Teams’ performance across shared or limited-access devices. In September 2019, Microsoft announced the retirement of Skype for Business in favor of Teams, which took effect on July 31, 2021. In early 2020, Microsoft introduced a push-to-talk "Walkie Talkie" feature aimed at firstline workers using smartphones and tablets over Wi-Fi or cellular networks. The COVID-19 pandemic significantly boosted usage of Teams. On March 19, 2020, Microsoft reported 44 million daily active users. In April, the platform logged 4.1 billion meeting minutes in a single day. A public preview of Microsoft Teams for Linux was released in December 2019, but the Linux client was discontinued in 2022. In July 2020, Microsoft shut down its video game livestreaming platform Mixer, and announced that some of its technologies would be repurposed for use in Teams. On February 28, 2025, Microsoft announced that Skype would be fully retired on May 5, 2025, with users given options to export their data or transition to Microsoft Teams. In October 2025, together with other Microsoft 365 suite apps, Teams had its logo updated. == Usage == == Underlying software == Microsoft Teams, as part of the Microsoft 365 suite, utilizes SharePoint and Exchange Online. Each Team, Shared Channel, and Private Channel has its own Microsoft 365 Group and SharePoint Site used for file storage. Messages are stored in Cosmos DB and are journaled to Exchange Online mailboxes. Private messages, including messages in Private Channels, are journaled to the sender and recipients' mailboxes. Public Channel messages are journaled to their corresponding Team's group mailbox, whereas, messages from Shared Channels are journaled to their own mailboxes. Contacts and voicemail are stored in Exchange Online. Microsoft Teams client is a web-based desktop app, originally developed on top of the Electron framework which combines the Chromium rendering engine and the Node.js JavaScript platform. Version 2.0 client was rebuilt using the Evergreen version of Microsoft Edge WebView2 in place of Electron. == Features == === Chats === Teams allows users to communicate in two-way persistent chats with one or multiple participants. Participants can message using text, emojis, stickers and gifs, as well as sharing links and files. In August 2022, the chat feature was updated for "chat with yourself"; allowing for the organization of files, notes, comments, images, and videos within a private chat tab. === Teams === Teams allows communities, groups, or teams to contribute in a shared workspace where messages and digital content on a specific topic are shared. Team members can join through an invitation sent by a team administrator or owner or sharing of a specific URL. Teams for Education allows admins and teachers to set up groups for classes, professional learning communities (PLCs), staff members, and everyone. === Channels === Channels allow team members to communicate without the use of email or group SMS (texting). Users can reply to posts with text, images, GIFs, and image macros. Direct messages send private messages to designated users rather than the entire channel. Connectors can be used within a channel to submit information contacted through a third-party service. Connectors include Mailchimp, Facebook Pages, Twitter, Power BI and Bing News. === Group conversations === Ad-hoc groups can be created to share instant messaging, audio calls (VoIP), and video calls inside the client software. === Telephone replacement === A feature on one of the higher cost licencing tiers allows connectivity to the public switched telephone network (PSTN) telephone system. This allows users to use Teams as if it were a telephone, making and receiving calls over the PSTN, including the ability to host "conference calls" with multiple participants. === Meeting === Meetings can be scheduled with multiple participants able to share audio, video, chat and presented content with all participants. Multiple users can connect via a meeting link. Automated minutes are possible using the recording and transcript features. Teams has a plugin for Microsoft Outlook to schedule a Teams Meeting in Outlook for a specific date and time and invite others to attend. If a meeting is scheduled within a channel, users visiting the channel are able to see if a meeting is in progress. ==== Teams Live Events ==== Teams Live Events replaces Skype Meeting Broadcast for users to broadcast to 10,000 participants on Teams, Yammer, or Microsoft Stream. ==== Breakout Rooms ==== Breakout rooms split a meeting into small groups. This is often utilized for collaboration during trainings or any environment where having all participants speak at once could be disruptive or unfeasible. Breakout rooms can be set by the hosts to a certain length of time, after which all participants will automatically rejoin the main meeting room. ==== Front Row ==== Front Row adjusts the layout of the viewer's screen, placing the speaker or content in the center of the gallery with other meeting participant's video feeds reduced in size and located below the speaker. === Education === Microsoft Teams for Education allows teachers to distribute, provide feedback, and grade student assignments turned in via Teams using the Assignments tab through Office 365 for Education subscribers. Quizzes can also be assigned to students through an integration with Office Forms. === Protocols === Microsoft Teams is based on a number of Microsoft-specific protocols. Video conferences are realized over the protocol MNP24, known from the Skype consumer version. VoIP and video conference clients based on SIP and H.323 need special gateways to connect to Microsoft Teams servers. With the help of Interactive Connectivity Establishment (ICE), clients behind Network address translation routers and restrictive firewalls are also able to connect, if peer-to-peer is not possible. === Integrations === Microsoft Teams has integrations through Microsoft AppSource, its integration marketplace. In 2020, Microsoft partnered with KUDO, a cloud-based solution with language interpretation, to allow integrated language meeting controls. In June 2022, an update was released using AI to improve call audio through the elimination of background feedback loops and cancelling non-vocal audio. == Anti-trust controversy == In July 2023, the European Commission opened an anti-trust investigation into the possibility that Microsoft unfairly used its office suite market power to increase sales of Teams and hurt

    Read more →
  • Dabbler

    Dabbler

    Dabbler is natural media drawing software for beginners. It was initially developed by Fractal Design Corporation. It is a simplified version of Fractal Design Painter, and included multimedia tutorials and a fullscreen interface. Dabbler was released as "Art Dabbler" after the MetaCreations merger, and rights were eventually transferred to Corel. Dabbler operating systems are Mac OS and Microsoft Windows.

    Read more →
  • Visual descriptor

    Visual descriptor

    In computer vision, visual descriptors or image descriptors are descriptions of the visual features of the contents in images, videos, or algorithms or applications that produce such descriptions. They describe elementary characteristics such as the shape, the color, the texture or the motion, among others. == Introduction == As a result of the new communication technologies and the massive use of Internet in our society, the amount of audio-visual information available in digital format is increasing considerably. Therefore, it has been necessary to design some systems that allow us to describe the content of several types of multimedia information in order to search and classify them. The audio-visual descriptors are in charge of the contents description. These descriptors have a good knowledge of the objects and events found in a video, image or audio and they allow the quick and efficient searches of the audio-visual content. This system can be compared to the search engines for textual contents. Although it is relatively easy to find text with a computer, it is much more difficult to find concrete audio and video parts. For instance, imagine somebody searching a scene of a happy person. The happiness is a feeling and it is not evident its shape, color and texture description in images. The description of the audio-visual content is not a superficial task and it is essential for the effective use of this type of archives. The standardization system that deals with audio-visual descriptors is the MPEG-7 (Motion Picture Expert Group - 7). == Types == Descriptors are the first step to find out the connection between pixels contained in a digital image and what humans recall after having observed an image or a group of images after some minutes. Visual descriptors are divided in two main groups: General information descriptors: contain low level descriptors which give a description about color, shape, regions, textures and motion. Specific domain information descriptors: give information about objects and events in the scene. A concrete example would be face recognition. === General information descriptors === General information descriptors consist of a set of descriptors that covers different basic and elementary features like: color, texture, shape, motion, location and others. This description is automatically generated by means of signal processing. ==== Color ==== It's the most basic quality of visual content. Five tools are defined to describe color. The three first tools represent the color distribution and the last ones describe the color relation between sequences or group of images: Dominant color descriptor (DCD) Scalable color descriptor (SCD) Color structure descriptor (CSD) Color layout descriptor (CLD) Group of frame (GoF) or group-of-pictures (GoP) ==== Texture ==== It's an important quality in order to describe an image. The texture descriptors characterize image textures or regions. They observe the region homogeneity and the histograms of these region borders. The set of descriptors is formed by: Homogeneous texture descriptor (HTD) Texture browsing descriptor (TBD) Edge histogram descriptor (EHD) ==== Shape ==== It contains important semantic information due to human's ability to recognize objects through their shape. However, this information can only be extracted by means of a segmentation similar to the one that the human visual system implements. Nowadays, such a segmentation system is not available yet, however there exists a serial of algorithms which are considered to be a good approximation. These descriptors describe regions, contours and shapes for 2D images and for 3D volumes. The shape descriptors are the following ones: Region-based shape descriptor (RSD) Contour-based shape descriptor (CSD) 3-D shape descriptor (3-D SD) ==== Motion ==== It's defined by four different descriptors which describe motion in video sequence. Motion is related to the objects motion in the sequence and to the camera motion. This last information is provided by the capture device, whereas the rest is implemented by means of image processing. The descriptor set is the following one: Motion activity descriptor (MAD) Camera motion descriptor (CMD) Motion trajectory descriptor (MTD) Warping and parametric motion descriptor (WMD and PMD) ==== Location ==== Elements location in the image is used to describe elements in the spatial domain. In addition, elements can also be located in the temporal domain: Region locator descriptor (RLD) Spatio temporal locator descriptor (STLD) === Specific domain information descriptors === These descriptors, which give information about objects and events in the scene, are not easily extractable, even more when the extraction is to be automatically done. Nevertheless, they can be manually processed. As mentioned before, face recognition is a concrete example of an application that tries to automatically obtain this information. == Descriptors applications == Among all applications, the most important ones are: Multimedia documents search engines and classifiers. Digital library: visual descriptors allow a very detailed and concrete search of any video or image by means of different search parameters. For instance, the search of films where a known actor appears, the search of videos containing the Everest mountain, etc. Personalized electronic news service. Possibility of an automatic connection to a TV channel broadcasting a soccer match, for example, whenever a player approaches the goal area. Control and filtering of concrete audiovisual content, like violent or pornographic material. Also, authorization for some multimedia content.

    Read more →
  • Ugly duckling theorem

    Ugly duckling theorem

    The ugly duckling theorem is an argument showing that classification is not really possible without some sort of bias. More particularly, it assumes finitely many properties combinable by logical connectives, and finitely many objects; it asserts that any two different objects share the same number of (extensional) properties. The theorem is named after Hans Christian Andersen's 1843 story "The Ugly Duckling", because it shows that a duckling is just as similar to a swan as two swans are to each other. It was derived by Satosi Watanabe in 1969. == Mathematical formula == Suppose there are n things in the universe, and one wants to put them into classes or categories. One has no preconceived ideas or biases about what sorts of categories are "natural" or "normal" and what are not. So one has to consider all the possible classes that could be, all the possible ways of making a set out of the n objects. There are 2 n {\displaystyle 2^{n}} such ways, the size of the power set of n objects. One can use that to measure the similarity between two objects, and one would see how many sets they have in common. However, one cannot. Any two objects have exactly the same number of classes in common if we can form any possible class, namely 2 n − 1 {\displaystyle 2^{n-1}} (half the total number of classes there are). To see this is so, one may imagine each class is represented by an n-bit string (or binary encoded integer), with a zero for each element not in the class and a one for each element in the class. As one finds, there are 2 n {\displaystyle 2^{n}} such strings. As all possible choices of zeros and ones are there, any two bit-positions will agree exactly half the time. One may pick two elements and reorder the bits so they are the first two, and imagine the numbers sorted lexicographically. The first 2 n / 2 {\displaystyle 2^{n}/2} numbers will have bit #1 set to zero, and the second 2 n / 2 {\displaystyle 2^{n}/2} will have it set to one. Within each of those blocks, the top 2 n / 4 {\displaystyle 2^{n}/4} will have bit #2 set to zero and the other 2 n / 4 {\displaystyle 2^{n}/4} will have it as one, so they agree on two blocks of 2 n / 4 {\displaystyle 2^{n}/4} or on half of all the cases, no matter which two elements one picks. So if we have no preconceived bias about which categories are better, everything is then equally similar (or equally dissimilar). The number of predicates simultaneously satisfied by two non-identical elements is constant over all such pairs. Thus, some kind of inductive bias is needed to make judgements to prefer certain categories over others. === Boolean functions === Let x 1 , x 2 , … , x n {\displaystyle x_{1},x_{2},\dots ,x_{n}} be a set of vectors of k {\displaystyle k} booleans each. The ugly duckling is the vector which is least like the others. Given the booleans, this can be computed using Hamming distance. However, the choice of boolean features to consider could have been somewhat arbitrary. Perhaps there were features derivable from the original features that were important for identifying the ugly duckling. The set of booleans in the vector can be extended with new features computed as boolean functions of the k {\displaystyle k} original features. The only canonical way to do this is to extend it with all possible Boolean functions. The resulting completed vectors have 2 k {\displaystyle 2^{k}} features. The ugly duckling theorem states that there is no ugly duckling because any two completed vectors will either be equal or differ in exactly half of the features. Proof. Let x and y be two vectors. If they are the same, then their completed vectors must also be the same because any Boolean function of x will agree with the same Boolean function of y. If x and y are different, then there exists a coordinate i {\displaystyle i} where the i {\displaystyle i} -th coordinate of x {\displaystyle x} differs from the i {\displaystyle i} -th coordinate of y {\displaystyle y} . Now the completed features contain every Boolean function on k {\displaystyle k} Boolean variables, with each one exactly once. Viewing these Boolean functions as polynomials in k {\displaystyle k} variables over GF(2), segregate the functions into pairs ( f , g ) {\displaystyle (f,g)} where f {\displaystyle f} contains the i {\displaystyle i} -th coordinate as a linear term and g {\displaystyle g} is f {\displaystyle f} without that linear term. Now, for every such pair ( f , g ) {\displaystyle (f,g)} , x {\displaystyle x} and y {\displaystyle y} will agree on exactly one of the two functions. If they agree on one, they must disagree on the other and vice versa. (This proof is believed to be due to Watanabe.) == Discussion == A possible way around the ugly duckling theorem would be to introduce a constraint on how similarity is measured by limiting the properties involved in classification, for instance, between A and B. However Medin et al. (1993) point out that this does not actually resolve the arbitrariness or bias problem since in what respects A is similar to B: "varies with the stimulus context and task, so that there is no unique answer, to the question of how similar is one object to another". For example, "a barberpole and a zebra would be more similar than a horse and a zebra if the feature striped had sufficient weight. Of course, if these feature weights were fixed, then these similarity relations would be constrained". Yet the property "striped" as a weight 'fix' or constraint is arbitrary itself, meaning: "unless one can specify such criteria, then the claim that categorization is based on attribute matching is almost entirely vacuous". Stamos (2003) remarked that some judgments of overall similarity are non-arbitrary in the sense they are useful: "Presumably, people's perceptual and conceptual processes have evolved that information that matters to human needs and goals can be roughly approximated by a similarity heuristic... If you are in the jungle and you see a tiger but you decide not to stereotype (perhaps because you believe that similarity is a false friend), then you will probably be eaten. In other words, in the biological world stereotyping based on veridical judgments of overall similarity statistically results in greater survival and reproductive success." Unless some properties are considered more salient, or 'weighted' more important than others, everything will appear equally similar, hence Watanabe (1986) wrote: "any objects, in so far as they are distinguishable, are equally similar". In a weaker setting that assumes infinitely many properties, Murphy and Medin (1985) give an example of two putative classified things, plums and lawnmowers: "Suppose that one is to list the attributes that plums and lawnmowers have in common in order to judge their similarity. It is easy to see that the list could be infinite: Both weigh less than 10,000 kg (and less than 10,001 kg), both did not exist 10,000,000 years ago (and 10,000,001 years ago), both cannot hear well, both can be dropped, both take up space, and so on. Likewise, the list of differences could be infinite… any two entities can be arbitrarily similar or dissimilar by changing the criterion of what counts as a relevant attribute." According to Woodward, the ugly duckling theorem is related to Schaffer's Conservation Law for Generalization Performance, which states that all algorithms for learning of boolean functions from input/output examples have the same overall generalization performance as random guessing. The latter result is generalized by Woodward to functions on countably infinite domains.

    Read more →
  • Pyramid (image processing)

    Pyramid (image processing)

    Pyramid, or pyramid representation, is a type of multi-scale signal representation developed by the computer vision, image processing and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsampling. Pyramid representation is a predecessor to scale-space representation and multiresolution analysis. == Pyramid generation == There are two main types of pyramids: lowpass and bandpass. A lowpass pyramid is made by smoothing the image with an appropriate smoothing filter and then subsampling the smoothed image, usually by a factor of 2 along each coordinate direction. The resulting image is then subjected to the same procedure, and the cycle is repeated multiple times. Each cycle of this process results in a smaller image with increased smoothing, but with decreased spatial sampling density (that is, decreased image resolution). If illustrated graphically, the entire multi-scale representation will look like a pyramid, with the original image on the bottom and each cycle's resulting smaller image stacked one atop the other. A bandpass pyramid is made by forming the difference between images at adjacent levels in the pyramid and performing image interpolation between adjacent levels of resolution, to enable computation of pixelwise differences. == Pyramid generation kernels == A variety of different smoothing kernels have been proposed for generating pyramids. Among the suggestions that have been given, the binomial kernels arising from the binomial coefficients stand out as a particularly useful and theoretically well-founded class. Thus, given a two-dimensional image, we may apply the (normalized) binomial filter (1/4, 1/2, 1/4) typically twice or more along each spatial dimension and then subsample the image by a factor of two. This operation may then proceed as many times as desired, leading to a compact and efficient multi-scale representation. If motivated by specific requirements, intermediate scale levels may also be generated where the subsampling stage is sometimes left out, leading to an oversampled or hybrid pyramid. With the increasing computational efficiency of CPUs available today, it is in some situations also feasible to use wider supported Gaussian filters as smoothing kernels in the pyramid generation steps. === Gaussian pyramid === In a Gaussian pyramid, subsequent images are weighted down using a Gaussian average (Gaussian blur) and scaled down. Each pixel containing a local average corresponds to a neighborhood pixel on a lower level of the pyramid. This technique is used especially in texture synthesis. === Laplacian pyramid === A Laplacian pyramid is very similar to a Gaussian pyramid but saves the difference image of the blurred versions between each levels. Only the smallest level is not a difference image to enable reconstruction of the high resolution image using the difference images on higher levels. This technique can be used in image compression. === Steerable pyramid === A steerable pyramid, developed by Simoncelli and others, is an implementation of a multi-scale, multi-orientation band-pass filter bank used for applications including image compression, texture synthesis, and object recognition. It can be thought of as an orientation selective version of a Laplacian pyramid, in which a bank of steerable filters are used at each level of the pyramid instead of a single Laplacian or Gaussian filter. == Applications of pyramids == === Alternative representation === In the early days of computer vision, pyramids were used as the main type of multi-scale representation for computing multi-scale image features from real-world image data. More recent techniques include scale-space representation, which has been popular among some researchers due to its theoretical foundation, the ability to decouple the subsampling stage from the multi-scale representation, the more powerful tools for theoretical analysis as well as the ability to compute a representation at any desired scale, thus avoiding the algorithmic problems of relating image representations at different resolution. Nevertheless, pyramids are still frequently used for expressing computationally efficient approximations to scale-space representation. === Detail manipulation === Levels of a Laplacian pyramid can be added to or removed from the original image to amplify or reduce detail at different scales. However, detail manipulation of this form is known to produce halo artifacts in many cases, leading to the development of alternatives such as the bilateral filter. Some image compression file formats use the Adam7 algorithm or some other interlacing technique. These can be seen as a kind of image pyramid. Because those file format store the "large-scale" features first, and fine-grain details later in the file, a particular viewer displaying a small "thumbnail" or on a small screen can quickly download just enough of the image to display it in the available pixels—so one file can support many viewer resolutions, rather than having to store or generate a different file for each resolution.

    Read more →
  • Inpainting

    Inpainting

    Inpainting is a conservation process where damaged, deteriorated, or missing parts of an artwork are filled in to present a complete image. This process is commonly used in image restoration. It can be applied to both physical and digital art mediums such as oil or acrylic paintings, chemical photographic prints, sculptures, or digital images and video. With its roots in physical artwork, such as painting and sculpture, traditional inpainting is performed by a trained art conservator who has carefully studied the artwork to determine the mediums and techniques used in the piece, potential risks of treatments, and ethical appropriateness of treatment. == History == The modern use of inpainting can be traced back to Pietro Edwards (1744–1821), Director of the Restoration of the Public Pictures in Venice, Italy. Using a scientific approach, Edwards focused his restoration efforts on the intentions of the artist. It was during the 1930 International Conference for the Study of Scientific Methods for the Examination and Preservation of Works of Art, that the modern approach to inpainting was established. Helmut Ruhemann (1891–1973), a German restorer and conservator, led the discussions on the use of inpainting in conservation. Helmut Ruhemann was a leading figure in modernizing restoration and conservation. His greatest contribution to the field of conservation "was his insistence on following the methods of the original painter exactly, and on understanding the painter's artistic intention". After his career of over 40 years as a conservator, Ruhemann published his treatise The Cleaning of Paintings: Problems & Potentialities in 1968. In describing his method, Ruhemann states that "The surface [of the fill] should be slightly lower than that of the surrounding paint to allow for the thickness of the inpainting...Inpainting medium should look and behave like the original medium, but must not darken with age." Cesare Brandi (1906–1988) developed the teoria del restauro, the inpainting approach combining aesthetics and psychology. However, this approach was used primarily by Italian restorers and conservators, with the terminology becoming widespread in the 1990s. Technological advancements led to new applications of inpainting. Widespread use of digital techniques range from entirely automatic computerized inpainting to tools used to simulate the process manually. Since the mid-1990s, the process of inpainting has evolved to include digital media. More commonly known as image or video interpolation, a form of estimation, digital inpainting includes the use of computer software that relies on sophisticated algorithms to replace lost or corrupted parts of the image data. == Ethics == In order to preserve the integrity of an original artwork, any inpainting technique or treatment applied to physical or digital work should be reversible or distinguishable from the original content of the artwork. Prior to any treatments, conservators proceed according to the American Institute of Conservation of Historical and Artistic Works. There are several ethic considerations before Inpainting can be justified. Various deliberation decisions over the ethical appropriateness of the amount and type of inpainting done, resides on many factors. As most conservation treatments, inpainting's ethical questions rest mainly with authenticity, reversibility and documentation.Any intervention to compensate for loss should be documented in treatment records and reports and should be detectable by common examination methods. Such compensation should be reversible and should not falsely modify the known aesthetic, conceptual, and physical characteristics of the cultural property, especially by removing or obscuring original material.New technologies and the aesthetic demand for perfect images without imperfections challenge conservators' ethical practices to protect the integrity of originals. == Methods == Inpainting methods and techniques depend on the desired goal and type of image being treated. Treatments to fill in the gaps are different between physical and digital art. In inpainting, detailed records of the initial state of the images can help with the treatment and replicate the original closer. === Physical inpainting === Inpainting is rooted in the conservation and restoration of paintings. Inpainting can aim to make a visual improvement to the artwork as a whole by repairing missing or damaged parts using methods and materials equivalent to the original artist's work. ==== Application techniques ==== By studying the painting methods of various artists and the composition of paints used historically, conservators are able to restore works very closely to their original visual appearance. The picture as a whole determines how to fill in the gap. Helmut Ruhemann's inpainting techniques by Jessell have procedures to "preserve" the quality of oil and tempera paintings. === Digital inpainting === Many programs are able to reconstruct missing or damaged areas of digital photographs and videos. Most widely known for use with digital images is Adobe Photoshop. Given the various abilities of the digital camera and the digitization of old photos, inpainting has become an automatic process that can be performed on digital images. The inpainting techniques can be applied to object removal, text removal, and other automatic modifications of images and videos. In video special effects, inpainting is usually performed after video matting. They can also be observed in applications like image compression and super-resolution. In photography and cinema, it is used for film restoration to reverse, repair, or mitigate deterioration (e.g., physical damage such as cracks in photographs, scratches and dust spots in film, or chemical damage resulting in image loss; performed infrared cleaning). It can also be used for removing red-eye, the stamped date from photographs, and objects for creative effect. This technique can be used to replace any lost blocks in the coding and transmission of images, for example, in a streaming video. It can also be used to remove logos or watermarks in videos. Deep learning neural network-based inpainting can be used for decensoring images. Deep image prior-based techniques can be used for digital image inpainting, where a trained deep learning model is either unavailable or infeasible. Deep models for visual content generation, like text-to-image or text-to-video, learn complex priors over the distribution of visual content, and can be used to inpaint missing parts. For example, videos can be separated into layers, using a technique called omnimatte, which either pretrain an omnimatte model or without any training using an omnimatte-zero model. Three main groups of 2D image-inpainting algorithms can be found in the literature. The first one to be noted is structural (or geometric) inpainting, the second one is texture inpainting, the last one is a combination of these two techniques. They use the information of the known or non-destroyed image areas in order to fill the gap, similar to how physical images are restored. ==== Structural ==== Structural or geometric inpainting is used for smooth images that have strong, defined borders. There are many different approaches to geometric inpainting, but they all come from the idea that geometry can be recovered from similar areas or domains. Bertalmio proposed a method of structural inpainting that mimics how conservators address painting restoration. Bertalmio proposed that by progressively transferring similar information from the borders of an inpainting domain inwards, the gap can be filled. ==== Textural ==== While structural/geometric inpainting works to repair smooth images, textural inpainting works best with images that are heavily textured. Texture has a repetitive pattern which means that a missing portion cannot be restored by continuing the level lines into the gap; level lines provide a complete, stable representation of an image. To repair texture in an image, one can combine frequency and spatial domain information to fill in a selected area with a desired texture. This method, while the most simple and very effective, works well when selecting a texture to be in-painted. For a texture that covers a wider area or a larger frame one would have to go through the image segmenting the areas to be in-painted and selecting the corresponding textures from throughout the image; there are programs that can help find the corresponding areas that work in a similar way as 'find and replace' works in a word processor. ==== Combined structural and textural ==== Combined structural and textural inpainting approaches simultaneously try to perform texture- and structure-filling in regions of missing image information. Most parts of an image consist of texture and structure and the boundaries between image regions contain a large amount of structural information. This is the result when blending differ

    Read more →
  • Trigram

    Trigram

    Trigrams are a special case of the n-gram, where n is 3. They are often used in natural language processing for performing statistical analysis of texts and in cryptography for control and use of ciphers and codes. See results of analysis of "Letter Frequencies in the English Language". == Frequency == Context is very important, varying analysis rankings and percentages are easily derived by drawing from different sample sizes, different authors; or different document types: poetry, science-fiction, technology documentation; and writing levels: stories for children versus adults, military orders, and recipes. Typical cryptanalytic frequency analysis finds that the 16 most common character-level trigrams in English are: Because encrypted messages sent by telegraph often omit punctuation and spaces, cryptographic frequency analysis of such messages includes trigrams that straddle word boundaries. This causes trigrams such as "edt" to occur frequently, even though it may never occur in any one word of those messages. == Examples == The sentence "the quick red fox jumps over the lazy brown dog" has the following word-level trigrams: the quick red quick red fox red fox jumps fox jumps over jumps over the over the lazy the lazy brown lazy brown dog And the word-level trigram "the quick red" has the following character-level trigrams (where an underscore "_" marks a space): the he_ e_q _qu qui uic ick ck_ k_r _re red

    Read more →
  • Huawei Mobile Services

    Huawei Mobile Services

    Huawei Mobile Services (HMS) is a collection of proprietary services and high level application programming interfaces (APIs) developed by Huawei Technologies Co., Ltd. Its hub known as HMS Core serves as a toolkit for app development on Huawei devices. HMS is typically installed on Huawei devices on top of running HarmonyOS 4.x and earlier operating system on its earlier devices running the Android operating system with EMUI including devices already distributed with Google Mobile Services. Alongside, HMS Core Wear Engine for Android phones with lightweight based LiteOS wearable middleware app framework integration connectivity like notifications, status etc. HMS consists of seven key services and the HMS Core. The key services are Huawei ID, Huawei Cloud, AppGallery, Themes, Huawei Video, Browser, and Assistant. The web browser is based on Chromium. Huawei Quick Apps is the alternative to Google Instant Apps. By January 2020, over 50,000 apps had been integrated with HMS Core. Its rival, Google Mobile Services has 3 million apps on Google's Play Store. The AppGallery claimed 180 billion downloads in 2019. In March 2020, HMS was used by 650 million monthly active users across 170 countries. A Chinese phone manufacturer, LeTV, hosted a smartphone business communication meeting in Beijing on September 27, 2021, to demonstrate its phone, the LeTV S1. This was the first smartphone from a third-party manufacturer to include Huawei Mobile Services (HMS). == HMS on Android and HarmonyOS == Huawei Mobile Services on Android goes all the way back to August 2016 as Huawei ID services for phones, basic functionalities for Huawei P9 series. However, in May 2019 proved to be a significant change to HMS when Google was prohibited from working with Huawei on any new devices extending ecosystem for AppGallery store front launched in April 2018, year prior. This also included bundling Google's Apps, including Gmail, Maps and YouTube. Any new Huawei devices launched after 16 May 2019 were unable to receive updates from Google services and would be considered 'uncertified' meaning Huawei's only solution at the time was to turn HMS into a genuine competitor to Google and incentivize app developers to utilize the platform. Huawei officially launched Huawei Mobile Services in China on December 24, 2019, as a beta. Huawei expanded Huawei Mobile Services in Europe in February 2020 and other markets in Asia, Latin America, Middle East & Africa, Canada, Mexico followed outside banned US market. HMS is available on the Honor 9X Pro, View 30 Pro, Huawei Mate XS. HMS is also available, alongside GMS, on many other Huawei models launched before the ban. Huawei promised developers it would take, “less than 10 minutes", to port their app over to HMS - to illustrate the ease of portability between Google's Play Store and the HMS AppGallery. On January 15, 2020, HMS Core 4.0 (Huawei Mobile Services Core 4.0) was officially launched. Huawei announced that at this time, there were already 1.3 million developers and 55,000 applications on board. The next day, Huawei held a developer day event in London and invested £20 million to encourage developers in the United Kingdom and Ireland to use HMS. On July 15, 2021, Huawei expanded HMS with classic HarmonyOS dual-framework that provided Java support and eventually with JavaScript and ArkTS (eTS) language support with HMS Core 6.0 for app development with primarily Android apps, alongside limited HAP imperative developed based apps that shares AOSP file system libraries in all types of devices from smartphones, tablets, smart screens, smartwatches, and car machines. Including various third-party development frameworks, such as React Native, Cordova, etc. At HDC 2023, Huawei unveiled HarmonyOS 5, marking a total break from the hybrid Android derived platform. This shift replaced the legacy Android and classic HarmonyOS-based HMS SDK with a full native API developer kit SDK built solely on OpenHarmony. The architecture moved from middleware services to vertical integration path. In this new model, HMS Core libraries are no longer external add-ons but are bundled directly into the system and DevEco Studio as native HarmonyOS Kits. == HMS Core == HMS Core is a hub for Huawei Mobile Services and serves as a toolkit for app development on Huawei devices. The core comprises Development, Growth and Monetizing and was created as a replacement for Google Mobile Services (GMS) Core. HMS core services were available in more than 55,000 apps in June 2020; HMS Core 5.0 debuted in September 2020. HMS Core 6.0 was launched in June 2021 with extended support for Huawei Cloud services. In June 2021, the number of registered developers within the HMS ecosystem was 4 million, and the number of apps integrated with the HMS Core had reached 134,000. As of July 2022, registered developers within HMS ecosystem had grown to 5 million, and the number of apps integrated with the HMS Core reached 203,000. The number of apps had grown to 220,000 by 30 September 2022. == AppGallery == The AppGallery has a key rival, Google's Play Store on Android. The AppGallery is available in 170 countries, across 78 languages. == Reception == The reception of HMS is mixed, with the majority of discussion based around the key Google/Android apps which are not yet present on the AppGallery and whether or not this presents a significant problem to users. The open development of HMS Core has been regarded by some as benefiting the Android project as a whole, "If Huawei continues to invest in a holistically open approach ... the result could be that we could all end up a bit less beholden to Google".

    Read more →
  • Jais (language model)

    Jais (language model)

    Jais is an open-source large language model launched in August 2023. Developed as a collaboration between Emirati AI company G42, the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), and US-based Cerebras Systems, Jais was designed to produce high-quality Arabic text and was also trained on English data. The model's creation was motivated by the underrepresentation of the Arabic language in the field of generative artificial intelligence. It aims to provide a more culturally and linguistically accurate model for the world's 400 million Arabic speakers. Its name is a reference to Jebel Jais, the highest mountain in the UAE. == Background and development == Jais was developed in response to the limited availability of advanced generative artificial intelligence models for the Arabic language, despite it being spoken by over 400 million people. Existing models were often trained on limited or low-quality Arabic web content, resulting in poor performance. The project represents a significant investment by the United Arab Emirates in the field of AI as part of its national strategy. The model was created through a partnership between Inception (now Core42), a subsidiary of the Abu Dhabi-based AI company G42; the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI); and Cerebras Systems, a US company specializing in AI hardware. The model is named after Jebel Jais, the highest peak in the UAE. == Training == The initial version of Jais released in August 2023 had 13 billion parameters. In November 2023, Core42 released Jais 30B, an improved version with 30 billion parameters. Both models were trained on a subset of the Cerebras Condor Galaxy 1 supercomputer. The training dataset consisted of a mix of Arabic, English, and computer code. According to Timothy Baldwin, a professor of natural language processing at MBZUAI, training the model on a diverse Arabic dataset allows it to switch between dialects. == Features == Jais is designed to generate text in both English and Arabic. The project has also released instruction-tuned "Chat" variants for both the 13B and 30B models, which are specifically optimized for conversational applications. Additional functionality for working with images, graphs, and tabular data is planned for future releases.

    Read more →
  • Workplace impact of artificial intelligence

    Workplace impact of artificial intelligence

    The impact of artificial intelligence on workers includes both applications to improve worker safety and health, and potential hazards that must be controlled. One potential application is using AI to eliminate hazards by removing humans from hazardous situations that involve risk of stress, overwork, or musculoskeletal injuries. Predictive analytics may also be used to identify conditions that may lead to hazards such as fatigue, repetitive strain injuries, or toxic substance exposure, leading to earlier interventions. Another is to streamline workplace safety and health workflows through automating repetitive tasks, enhancing safety training programs through virtual reality, or detecting and reporting near misses. When used in the workplace, AI also presents the possibility of new hazards. These may arise from machine learning techniques leading to unpredictable behavior and inscrutability in their decision-making, or from cybersecurity and information privacy issues. Many hazards of AI are psychosocial due to its potential to cause changes in work organization. These include increased monitoring leading to micromanagement, algorithms unintentionally or intentionally mimicking undesirable human biases, and assigning blame for machine errors to the human operator instead. AI may also lead to physical hazards in the form of human–robot collisions, and ergonomic risks of control interfaces and human–machine interactions. Hazard controls include cybersecurity and information privacy measures, communication and transparency with workers about data usage, and limitations on collaborative robots. From a workplace safety and health perspective, only "weak" or "narrow" AI that is tailored to a specific task is relevant, as there are many examples that are currently in use or expected to come into use in the near future. Certain digital technologies are predicted to result in job losses. Starting in the 2020s, the adoption of modern robotics has led to net employment growth. However, many businesses anticipate that automation, or employing robots would result in job losses in the future. This is especially true for companies in Central and Eastern Europe. Other digital technologies, such as platforms or big data, are projected to have a more neutral impact on employment. A large number of tech workers have been laid off starting in 2023; many such job cuts have been attributed to artificial intelligence. == Health and safety applications == In order for any potential AI health and safety application to be adopted, it requires acceptance by both managers and workers. For example, worker acceptance may be diminished by concerns about information privacy, or from a lack of trust and acceptance of the new technology, which may arise from inadequate transparency or training. Alternatively, managers may emphasize increases in economic productivity rather than gains in worker safety and health when implementing AI-based systems. === Eliminating hazardous tasks === AI may increase the scope of work tasks where a worker can be removed from a situation that carries risk. In a sense, while traditional automation can replace the functions of a worker's body with a robot, AI effectively replaces the functions of their brain with a computer. Hazards that can be avoided include stress, overwork, musculoskeletal injuries, and boredom. This can expand the range of affected job sectors into white-collar and service sector jobs such as in medicine, finance, and information technology. === Analytics to reduce risk === Machine learning is used for people analytics to make predictions about worker behavior to assist management decision-making, such as hiring and performance assessment. These could also be used to improve worker health. The analytics may be based on inputs such as online activities, monitoring of communications, location tracking, and voice analysis and body language analysis of filmed interviews. For example, sentiment analysis may be used to spot fatigue to prevent overwork. Decision support systems have a similar ability to be used to, for example, prevent industrial disasters or make disaster response more efficient. For manual material handling workers, predictive analytics and artificial intelligence may be used to reduce musculoskeletal injury. Traditional guidelines are based on statistical averages and are geared towards anthropometrically typical humans. The analysis of large amounts of data from wearable sensors may allow real-time, personalized calculation of ergonomic risk and fatigue management, as well as better analysis of the risk associated with specific job roles. Wearable sensors may also enable earlier intervention against exposure to toxic substances than is possible with area or breathing zone testing on a periodic basis. Furthermore, the large data sets generated could improve workplace health surveillance, risk assessment, and research. === Streamlining safety and health workflows === AI has also been used to attempt to make the workplace safety and health workflow more efficient. One example is coding of workers' compensation claims, which are submitted in a prose narrative form and must manually be assigned standardized codes. AI is being investigated to perform this task faster, more cheaply, and with fewer errors. == Hazards == There are several broad aspects of AI that may give rise to specific hazards. The risks depend on implementation rather than the mere presence of AI. Systems using sub-symbolic AI such as machine learning may behave unpredictably and are more prone to inscrutability in their decision-making. This is especially true if a situation is encountered that was not part of the AI's training dataset, and is exacerbated in environments that are less structured. Undesired behavior may also arise from flaws in the system's perception (arising either from within the software or from sensor degradation), knowledge representation and reasoning, or from software bugs. They may arise from improper training, such as a user applying the same algorithm to two problems that do not have the same requirements. Machine learning applied during the design phase may have different implications than that applied at runtime. Systems using symbolic AI are less prone to unpredictable behavior. The use of AI also increases cybersecurity risks relative to platforms that do not use AI, and information privacy concerns about collected data may pose a hazard to workers. === Psychosocial === Psychosocial hazards are those that arise from the way work is designed, organized, and managed, or its economic and social contexts, rather than arising from a physical substance or object. They cause not only psychiatric and psychological outcomes such as occupational burnout, anxiety disorders, and depression, but they can also cause physical injury or illness such as cardiovascular disease or musculoskeletal injury. Many hazards of AI are psychosocial in nature due to its potential to cause changes in work organization, in terms of increasing complexity and interaction between different organizational factors. However, psychosocial risks are often overlooked by designers of advanced manufacturing systems. Einola and Khoreva explore how different organizational groups perceive and interact with AI technologies. Their research shows that successful AI integration depends on human ownership and contextual understanding. They caution against blind technological optimism and stress the importance of tailoring AI use to specific workplace ecosystems. This perspective reinforces the need for inclusive design and transparent implementation strategies. ==== Changes in work practices ==== Over-reliance on AI tools may lead to deskilling of some professions. When AI becomes a substitute for traditional peer collaboration and mentorship, there is a risk of diminishing opportunities for interpersonal skill development and team-based learning. Increased monitoring may lead to micromanagement and thus to stress and anxiety. A perception of surveillance may also lead to stress. Controls for these include consultation with worker groups, extensive testing, and attention to introduced bias. Wearable sensors, activity trackers, and augmented reality may also lead to stress from micromanagement, both for assembly line workers and gig workers. Gig workers also lack the legal protections and rights of formal workers. Newell & Marabelli argue that AI alters power dynamics and employee autonomy, requiring a more nuanced understanding of its social and organizational implications. There is also the risk of people being forced to work at a robot's pace, or to monitor robot performance at nonstandard hours. A 2025 preprint paper based on users' interactions with the AI chatbot Microsoft Copilot identified forty jobs that the author's claimed had high overlaps with the capabilities of AI. Some media outlets used this paper to report on jobs becoming obsolete. Cri

    Read more →
  • Negobot

    Negobot

    Negobot also referred to as Lolita or Lolita chatbot is a chatterbot that was introduced to the public in 2013, designed by researchers from the University of Deusto and Optenet to catch online pedophiles. It is a conversational agent that utilizes natural language processing (NLP), information retrieval (IR) and Automatic Learning. Because the bot poses as a young female in order to entice and track potential predators, it became known in media as the "virtual Lolita", in reference to Vladimir Nabokov's novel. == Background == In 2013, the University of Deusto researchers published a paper on their work with Negobot and disclosed the text online. In their abstract, the researchers addressed the issue that an increasing number of children are using the internet and that these young users are more susceptible to existing internet risks. Their main objective was to create a chatterbot with the ability to trap online predators that posed a threat to children. They intended to deploy the bot into sites frequented by predators such as social networks and chatrooms. The university researchers used information provided by anti-pedophilia activist organization Perverted-Justice, including examples of online encounters and conversations with sexual predators, to supplement the program's artificial intelligence system. == Features == === Programmed persona === The chatterbot takes the guise of a naive and vulnerable 14-year-old girl. The bot's programmers used methods of artificial intelligence and natural language processing to create a conversational agent fluent in typical teenage slang, misspellings, and knowledge of pop culture. Through these linguistic features, the bot is able to mimic the conversational style of young teenagers. It also features split personalities and seven different patterns of conversation. Negobot's primary creator, Dr. Carlos Laorden, expressed the significance of the bot's distinguishable style of communication, stating that normally, "chatbots tend to be very predictable. Their behavior and interest in a conversation are flat, which is a problem when attempting to detect untrustworthy targets like paedophiles." What makes Negobot different is its game theory feature, which makes it able to "maintain a much more realistic conversation." Apart from being able to imitate a stereotypical teenager, the program is also able to translate messages into different languages. === Game theory === Negobot's designers programmed it with the ability to treat conversations with potential predators as if it were a game, the objective being to collect as much information on the suspect as possible that could provide evidence of pedophilic characteristics and motives. The use of game theory shapes the decisions the bot makes and the overall direction of the conversation. The bot initiates its undercover operations by entering a chat as a passive participant, waiting to be chatted by a user. Once a user elicits conversation, the bot will frame the conversation in such a way that keeps the target engaged, extracting personal information and discouraging it from leaving the chat. The information is then recorded to be potentially sent to the police. If the target seems to lose interest, the bot attempts to make it feel guilty by expressing sentiments of loneliness and emotional need through strategic, formulated responses, ultimately prolonging interaction. In addition, the bot may provide fake information about itself in attempt to lure the target into physical meetings. === Limitations === Despite being able to carry out a realistic conversation, Negobot is still unable to detect linguistic subtleties in the messages of others, including sarcasm. == Controversy == John Carr, a specialist in online child safety, expressed his concern to BBC over the legality of this undercover investigation. He claimed that using the bot on unsuspecting internet users could be considered a form of entrapment or harassment. The type of information that Negobot collects from potential online predators, he said, is unlikely to be upheld in court. Furthermore, he warned that relying on only software without any real-world policing risks enticing individuals to do or say things that they would not have if real-world policing were a factor.

    Read more →
  • Textual entailment

    Textual entailment

    In natural language processing, textual entailment (TE), also known as natural language inference (NLI), is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text. == Definition == In the TE framework, the entailing and entailed texts are termed text (t) and hypothesis (h), respectively. Textual entailment is not the same as pure logical entailment – it has a more relaxed definition: "t entails h" (t ⇒ h) if, typically, a human reading t would infer that h is most likely true. (Alternatively: t ⇒ h if and only if, typically, a human reading t would be justified in inferring the proposition expressed by h from the proposition expressed by t.) The relation is directional because even if "t entails h", the reverse "h entails t" is much less certain. Determining whether this relationship holds is an informal task, one which sometimes overlaps with the formal tasks of formal semantics (satisfying a strict condition will usually imply satisfaction of a less strict conditioned); additionally, textual entailment partially subsumes word entailment. == Examples == Textual entailment can be illustrated with examples of three different relations: An example of a positive TE (text entails hypothesis) is: text: If you help the needy, God will reward you. hypothesis: Giving money to a poor man has good consequences. An example of a negative TE (text contradicts hypothesis) is: text: If you help the needy, God will reward you. hypothesis: Giving money to a poor man has no consequences. An example of a non-TE (text does not entail nor contradict) is: text: If you help the needy, God will reward you. hypothesis: Giving money to a poor man will make you a better person. == Ambiguity of natural language == A characteristic of natural language is that there are many different ways to state what one wants to say: several meanings can be contained in a single text and the same meaning can be expressed by different texts. This variability of semantic expression can be seen as the dual problem of language ambiguity. Together, they result in a many-to-many mapping between language expressions and meanings. The task of paraphrasing involves recognizing when two texts have the same meaning and creating a similar or shorter text that conveys almost the same information. Textual entailment is similar but weakens the relationship to be unidirectional. Mathematical solutions to establish textual entailment can be based on the directional property of this relation, by making a comparison between some directional similarities of the texts involved. == Approaches == Textual entailment measures natural language understanding as it asks for a semantic interpretation of the text, and due to its generality remains an active area of research. Many approaches and refinements of approaches have been considered, such as word embedding, logical models, graphical models, rule systems, contextual focusing, and machine learning. Practical or large-scale solutions avoid these complex methods and instead use only surface syntax or lexical relationships, but are correspondingly less accurate. As of 2005, state-of-the-art systems are far from human performance; a study found humans to agree on the dataset 95.25% of the time. Algorithms from 2016 had not yet achieved 90%. == Applications == Many natural language processing applications, like question answering, information extraction, summarization, multi-document summarization, and evaluation of machine translation systems, need to recognize that a particular target meaning can be inferred from different text variants. Typically entailment is used as part of a larger system, for example in a prediction system to filter out trivial or obvious predictions. Textual entailment also has applications in adversarial stylometry, which has the objective of removing textual style without changing the overall meaning of communication. == Datasets == Some of available English NLI datasets include: SNLI MultiNLI SciTail SICK MedNLI QA-NLI In addition, there are several non-English NLI datasets, as follows: XNLI DACCORD, RTE3-FR, SICK-FR for French FarsTail for Farsi OCNLI for Chinese SICK-NL for Dutch IndoNLI for Indonesian

    Read more →