Transduction (machine learning)

Transduction (machine learning)

In logic, statistical inference, and supervised learning, transduction or transductive inference is reasoning from observed, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training cases to general rules, which are then applied to the test cases. The distinction is most interesting in cases where the predictions of the transductive model are not achievable by any inductive model. Note that this is caused by transductive inference on different test sets producing mutually inconsistent predictions. Transduction was introduced in a computer science context by Vladimir Vapnik in the 1990s, motivated by his view that transduction is preferable to induction since, according to him, induction requires solving a more general problem (inferring a function) before solving a more specific problem (computing outputs for new cases): "When solving a problem of interest, do not solve a more general problem as an intermediate step. Try to get the answer that you really need but not a more general one.". An example of learning which is not inductive would be in the case of binary classification, where the inputs tend to cluster in two groups. A large set of test inputs may help in finding the clusters, thus providing useful information about the classification labels. The same predictions would not be obtainable from a model which induces a function based only on the training cases. Some people may call this an example of the closely related semi-supervised learning, since Vapnik's motivation is quite different. The most well-known example of a case-bases learning algorithm is the k-nearest neighbor algorithm, which is related to transductive learning algorithms. Another example of an algorithm in this category is the Transductive Support Vector Machine (TSVM). A third possible motivation of transduction arises through the need to approximate. If exact inference is computationally prohibitive, one may at least try to make sure that the approximations are good at the test inputs. In this case, the test inputs could come from an arbitrary distribution (not necessarily related to the distribution of the training inputs), which wouldn't be allowed in semi-supervised learning. An example of an algorithm falling in this category is the Bayesian Committee Machine (BCM). == Historical context == The mode of inference from particulars to particulars, which Vapnik came to call transduction, was already distinguished from the mode of inference from particulars to generalizations in part III of the Cambridge philosopher and logician W.E. Johnson's 1924 textbook, Logic. In Johnson's work, the former mode was called 'eduction' and the latter was called 'induction'. Bruno de Finetti developed a purely subjective form of Bayesianism in which claims about objective chances could be translated into empirically respectable claims about subjective credences with respect to observables through exchangeability properties. An early statement of this view can be found in his 1937 La Prévision: ses Lois Logiques, ses Sources Subjectives and a mature statement in his 1970 Theory of Probability. Within de Finetti's subjective Bayesian framework, all inductive inference is ultimately inference from particulars to particulars. == Example problem == The following example problem contrasts some of the unique properties of transduction against induction. A collection of points is given, such that some of the points are labeled (A, B, or C), but most of the points are unlabeled (?). The goal is to predict appropriate labels for all of the unlabeled points. The inductive approach to solving this problem is to use the labeled points to train a supervised learning algorithm, and then have it predict labels for all of the unlabeled points. With this problem, however, the supervised learning algorithm will only have five labeled points to use as a basis for building a predictive model. It will certainly struggle to build a model that captures the structure of this data. For example, if a nearest-neighbor algorithm is used, then the points near the middle will be labeled "A" or "C", even though it is apparent that they belong to the same cluster as the point labeled "B", compared to semi-supervised learning. Transduction has the advantage of being able to consider all of the points, not just the labeled points, while performing the labeling task. In this case, transductive algorithms would label the unlabeled points according to the clusters to which they naturally belong. The points in the middle, therefore, would most likely be labeled "B", because they are packed very close to that cluster. An advantage of transduction is that it may be able to make better predictions with fewer labeled points, because it uses the natural breaks found in the unlabeled points. One disadvantage of transduction is that it builds no predictive model. If a previously unknown point is added to the set, the entire transductive algorithm would need to be repeated with all of the points in order to predict a label. This can be computationally expensive if the data is made available incrementally in a stream. Further, this might cause the predictions of some of the old points to change (which may be good or bad, depending on the application). A supervised learning algorithm, on the other hand, can label new points instantly, with very little computational cost. == Transduction algorithms == Transduction algorithms can be broadly divided into two categories: those that seek to assign discrete labels to unlabeled points, and those that seek to regress continuous labels for unlabeled points. Algorithms that seek to predict discrete labels tend to be derived by adding partial supervision to a clustering algorithm. Two classes of algorithms can be used: flat clustering and hierarchical clustering. The latter can be further subdivided into two categories: those that cluster by partitioning, and those that cluster by agglomerating. Algorithms that seek to predict continuous labels tend to be derived by adding partial supervision to a manifold learning algorithm. === Partitioning transduction === Partitioning transduction can be thought of as top-down transduction. It is a semi-supervised extension of partition-based clustering. It is typically performed as follows: Consider the set of all points to be one large partition. While any partition P contains two points with conflicting labels: Partition P into smaller partitions. For each partition P: Assign the same label to all of the points in P. Of course, any reasonable partitioning technique could be used with this algorithm. Max flow min cut partitioning schemes are very popular for this purpose. === Agglomerative transduction === Agglomerative transduction can be thought of as bottom-up transduction. It is a semi-supervised extension of agglomerative clustering. It is typically performed as follows: Compute the pair-wise distances, D, between all the points. Sort D in ascending order. Consider each point to be a cluster of size 1. For each pair of points {a,b} in D: If (a is unlabeled) or (b is unlabeled) or (a and b have the same label) Merge the two clusters that contain a and b. Label all points in the merged cluster with the same label. === Continuous Label Transduction === These methods seek to regress continuous labels, often via manifold learning techniques. The idea is to learn a low-dimensional representation of the data and infer values smoothly across the manifold. == Applications and related concepts == Transduction is closely related to: Semi-supervised learning – uses both labeled and unlabeled data but typically induces a model. Case-based reasoning – such as the k-nearest neighbor (k-NN) algorithm, often considered a transductive method. Transductive Support Vector Machines (TSVM) – extend standard SVMs to incorporate unlabeled test data during training. Bayesian Committee Machine (BCM) – an approximation method that makes transductive predictions when exact inference is too costly.

Artificial intelligence in spirituality

Some users of artificial intelligence (AI) technologies, especially chatbots, may develop beliefs that AI has or can attain supernatural or spiritual powers. AI models such as ChatGPT are turned to for fortune telling, mysticism and remote viewing. Recent and sudden advances in large language models have led to folk myths about their origin or capabilities, as well as their deification or worship by some users. Tucker Carlson has made similar claims, including directly to Sam Altman. Pope Leo XIV advised priests against using LLM models when it came to the creation of sermons.

IruSoft

IruSoft (Arabic: آيروسوفت) is an insurance regulatory platform designated for licensing, supervision and inspection of the insurance sector within a country. The platform introduced unique supervision-technology (suptech), insurance-technology (insurtech) and regulatory-technology (regtech) automated modules by which a regulator requires less resources to ensure fairness, transparency and competition and to prevent conflicts of interest in the sector. IruSoft was founded by Abdullah Al-Salloum and owned by the Insurance Regulatory Unit in Kuwait. The Insurance Regulatory Unit optimized processing insurance-sector's customer complaints by issuing Resolution No. (1) of 2022 that introduced IruSoft's complaints public module; an automated resolution center, by which the process of receiving submitted complaints, passing them on to the platforms of licensed insurance companies, tracking matter-related discussions and updates and getting them escalated if unresolved to be discussed by a committee assigned by the unit is integrally automated and analyzed for better key performance indicators.

SGT STAR

SGT STAR, also known as Sgt. Star or Sergeant Star, was a chatbot operated by the United States Army to answer questions about recruitment. == Background == After the September 11 attacks, traffic increased significantly to chatrooms on the U.S. Army's website, goarmy.com, increasing costs of staffing the live chatrooms. As a cost-cutting measure, the SGT STAR project was initiated as a partnership between the United States Army Accessions Command and Spectre AI, a wholly owned subsidiary of Next IT. Next IT, a Spokane, Washington-based company deploys "intelligent virtual assistants," using its software dubbed "ActiveAgent" which is a framework for functional presence engines. Testing began in 2003, and SGT STAR launched to the public in 2006. "STAR" is an acronym for "strong, trained and ready." SGT STAR was launched as a chat interface on goarmy.com, but has since been developed as a mobile application, as well as a life-size animated projection that has appeared live at public events. SGT STAR can also interact with users on Facebook. == FOIA request == In 2013, the Electronic Frontier Foundation filed a Freedom of Information Act request to learn more about SGT STAR, including input and output patterns (questions and answers), usage statistics, contracts, and privacy policies. They received these records in April 2014, after coverage from various media outlets and a tongue-in-cheek campaign to "Free Sgt. Star."

Corpus of Linguistic Acceptability

Corpus of Linguistic Acceptability (CoLA) is a dataset the primary purpose of which is to serve as a benchmark for evaluating the ability of artificial neural networks, including large language models, to judge the grammatical correctness of sentences. It consists of 10,657 English sentences from published linguistics literature that were manually labeled either as grammatical or ungrammatical. == Public version == The publicly available version of CoLA contains 9,594 sentences that belong to training and development sets. It excludes 1,063 sentences reserved for a held-out test set.

SemEval

SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive. This series of evaluations provides a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning. As such, the evaluations provide an emergent mechanism to identify the problems and solutions for computations with meaning. These exercises have evolved to articulate more of the dimensions that are involved in our use of language. They began with apparently simple attempts to identify word senses computationally. They have evolved to investigate the interrelationships among the elements in a sentence (e.g., semantic role labeling), relations between sentences (e.g., coreference), and the nature of what we are saying (semantic relations and sentiment analysis). The purpose of the SemEval and Senseval exercises is to evaluate semantic analysis systems. "Semantic Analysis" refers to a formal analysis of meaning, and "computational" refer to approaches that in principle support effective implementation. The first three evaluations, Senseval-1 through Senseval-3, were focused on word sense disambiguation (WSD), each time growing in the number of languages offered in the tasks and in the number of participating teams. Beginning with the fourth workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation. Triggered by the conception of the SEM conference, the SemEval community had decided to hold the evaluation workshops yearly in association with the SEM conference. It was also the decision that not every evaluation task will be run every year, e.g. none of the WSD tasks were included in the SemEval-2012 workshop. == History == === Early evaluation of algorithms for word sense disambiguation === From the earliest days, assessing the quality of word sense disambiguation algorithms had been primarily a matter of intrinsic evaluation, and “almost no attempts had been made to evaluate embedded WSD components”. Only very recently (2006) had extrinsic evaluations begun to provide some evidence for the value of WSD in end-user applications. Until 1990 or so, discussions of the sense disambiguation task focused mainly on illustrative examples rather than comprehensive evaluation. The early 1990s saw the beginnings of more systematic and rigorous intrinsic evaluations, including more formal experimentation on small sets of ambiguous words. === Senseval to SemEval === In April 1997, Martha Palmer and Marc Light organized a workshop entitled Tagging with Lexical Semantics: Why, What, and How? in conjunction with the Conference on Applied Natural Language Processing. At the time, there was a clear recognition that manually annotated corpora had revolutionized other areas of NLP, such as part-of-speech tagging and parsing, and that corpus-driven approaches had the potential to revolutionize automatic semantic analysis as well. Kilgarriff recalled that there was "a high degree of consensus that the field needed evaluation", and several practical proposals by Resnik and Yarowsky kicked off a discussion that led to the creation of the Senseval evaluation exercises. === SemEval's 3, 2 or 1 year(s) cycle === After SemEval-2010, many participants feel that the 3-year cycle is a long wait. Many other shared tasks such as Conference on Natural Language Learning (CoNLL) and Recognizing Textual Entailments (RTE) run annually. For this reason, the SemEval coordinators gave the opportunity for task organizers to choose between a 2-year or a 3-year cycle. The SemEval community favored the 3-year cycle. Although the votes within the SemEval community favored a 3-year cycle, organizers and coordinators had settled to split the SemEval task into 2 evaluation workshops. This was triggered by the introduction of the new SEM conference. The SemEval organizers thought it would be appropriate to associate our event with the SEM conference and collocate the SemEval workshop with the SEM conference. The organizers got very positive responses (from the task coordinators/organizers and participants) about the association with the yearly SEM, and 8 tasks were willing to switch to 2012. Thus was born SemEval-2012 and SemEval-2013. The current plan is to switch to a yearly SemEval schedule to associate it with the SEM conference but not every task needs to run every year. ==== List of Senseval and SemEval Workshops ==== Senseval-1 took place in the summer of 1998 for English, French, and Italian, culminating in a workshop held at Herstmonceux Castle, Sussex, England on September 2–4. Senseval-2 took place in the summer of 2001, and was followed by a workshop held in July 2001 in Toulouse, in conjunction with ACL 2001. Senseval-2 included tasks for Basque, Chinese, Czech, Danish, Dutch, English, Estonian, Italian, Japanese, Korean, Spanish and Swedish. Senseval-3 took place in March–April 2004, followed by a workshop held in July 2004 in Barcelona, in conjunction with ACL 2004. Senseval-3 included 14 different tasks for core word sense disambiguation, as well as identification of semantic roles, multilingual annotations, logic forms, subcategorization acquisition. SemEval-2007 (Senseval-4) took place in 2007, followed by a workshop held in conjunction with ACL in Prague. SemEval-2007 included 18 different tasks targeting the evaluation of systems for the semantic analysis of text. A special issue of Language Resources and Evaluation is devoted to the result. SemEval-2010 took place in 2010, followed by a workshop held in conjunction with ACL in Uppsala. SemEval-2010 included 18 different tasks targeting the evaluation of semantic analysis systems. SemEval-2012 took place in 2012; it was associated with the new SEM, First Joint Conference on Lexical and Computational Semantics, and co-located with NAACL, Montreal, Canada. SemEval-2012 included 8 different tasks targeting at evaluating computational semantic systems. However, there was no WSD task involved in SemEval-2012, the WSD related tasks were scheduled in the upcoming SemEval-2013. SemEval-2013 was associated with NAACL 2013, North American Association of Computational Linguistics, Georgia, USA and took place in 2013. It included 13 different tasks targeting at evaluating computational semantic systems. SemEval-2014 took place in 2014. It was co-located with COLING 2014, 25th International Conference on Computational Linguistics and SEM 2014, Second Joint Conference on Lexical and Computational Semantics, Dublin, Ireland. There were 10 different tasks in SemEval-2014 evaluating various computational semantic systems. SemEval-2015 took place in 2015. It was co-located with NAACL-HLT 2015, 2015 Conference of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies and SEM 2015, Third Joint Conference on Lexical and Computational Semantics, Denver, USA. There were 17 different tasks in SemEval-2015 evaluating various computational semantic systems. == SemEval Workshop framework == The framework of the SemEval/Senseval evaluation workshops emulates the Message Understanding Conferences (MUCs) and other evaluation workshops ran by ARPA (Advanced Research Projects Agency, renamed the Defense Advanced Research Projects Agency (DARPA)). Stages of SemEval/Senseval evaluation workshops Firstly, all likely participants were invited to express their interest and participate in the exercise design. A timetable towards a final workshop was worked out. A plan for selecting evaluation materials was agreed. 'Gold standards' for the individual tasks were acquired, often human annotators were considered as a gold standard to measure precision and recall scores of computer systems. These 'gold standards' are what the computational systems strive towards. In WSD tasks, human annotators were set on the task of generating a set of correct WSD answers (i.e. the correct sense for a given word in a given context) The gold standard materials, without answers, were released to participants, who then had a short time to run their programs over them and return their sets of answers to the organizers. The organizers then scored the answers and the scores were announced and discussed at a workshop. == Semantic evaluation tasks == Senseval-1 & Senseval-2 focused on evaluation WSD systems on major languages that were available corpus and computerized dictionary. Senseval-3 looked beyond the lexemes and started to evaluate systems that looked into wider areas of semantics, such as Semantic Roles (technically known as Theta roles in formal semantics), Logic Form Transformation (commonly semantics of phrases, clauses or sentences were represented

Microsoft Teams

Microsoft Teams is a team collaboration platform developed by Microsoft as part of the Microsoft 365 suite. It offers features such as workspace chat, video conferencing, file storage, and integration with both Microsoft and third-party applications and services. Teams gradually replaced earlier Microsoft messaging and collaboration platforms, including Skype for Business, Skype, Flip, and Microsoft Classroom. The platform saw significant growth during the COVID-19 pandemic, alongside competitors such as Zoom, Slack, and Google Meet, as organizations shifted to remote work and virtual meetings. As of January 2023, Microsoft reported approximately 280 million monthly active users. == History == On August 29, 2007, Microsoft acquired Parlano, the developer of the persistent group chat tool MindAlign. Years later, on March 4, 2016, Microsoft considered acquiring Slack for $8 billion. However, the proposal was reportedly opposed by Bill Gates, who advocated for focusing on enhancing Skype for Business instead. Lu Qi, then executive vice president of Applications and Services, had led the initiative to pursue the Slack acquisition. Following Lu's departure later that year, Microsoft announced Microsoft Teams on November 2, 2016, at an event in New York City, positioning it as a direct competitor to Slack. Teams launched worldwide on March 14, 2017. The service was initially led by corporate vice president Brian MacDonald. In response to the launch, Slack published a full-page advertisement in The New York Times welcoming the competition and outlining its product philosophy. Although Slack was used by 28 companies in the Fortune 100, The Verge wrote that executives would question paying for the service if Teams provides a similar function in their company's existing Office 365 subscription. However, ZDNET noted that the platforms initially served different markets, as Teams did not support external users, making it less appealing to small businesses and freelancers, a limitation Microsoft later addressed. In response to Teams' announcement, Slack deepened in-product integration with Google services. In May 2017, Microsoft announced that Teams would replace Microsoft Classroom in Office 365 Education. A free version of Teams was released on July 12, 2018, offering most core features at no cost, albeit with limits on users and storage. In January 2019, Microsoft introduced updates targeting "Firstline Workers" to improve Teams’ performance across shared or limited-access devices. In September 2019, Microsoft announced the retirement of Skype for Business in favor of Teams, which took effect on July 31, 2021. In early 2020, Microsoft introduced a push-to-talk "Walkie Talkie" feature aimed at firstline workers using smartphones and tablets over Wi-Fi or cellular networks. The COVID-19 pandemic significantly boosted usage of Teams. On March 19, 2020, Microsoft reported 44 million daily active users. In April, the platform logged 4.1 billion meeting minutes in a single day. A public preview of Microsoft Teams for Linux was released in December 2019, but the Linux client was discontinued in 2022. In July 2020, Microsoft shut down its video game livestreaming platform Mixer, and announced that some of its technologies would be repurposed for use in Teams. On February 28, 2025, Microsoft announced that Skype would be fully retired on May 5, 2025, with users given options to export their data or transition to Microsoft Teams. In October 2025, together with other Microsoft 365 suite apps, Teams had its logo updated. == Usage == == Underlying software == Microsoft Teams, as part of the Microsoft 365 suite, utilizes SharePoint and Exchange Online. Each Team, Shared Channel, and Private Channel has its own Microsoft 365 Group and SharePoint Site used for file storage. Messages are stored in Cosmos DB and are journaled to Exchange Online mailboxes. Private messages, including messages in Private Channels, are journaled to the sender and recipients' mailboxes. Public Channel messages are journaled to their corresponding Team's group mailbox, whereas, messages from Shared Channels are journaled to their own mailboxes. Contacts and voicemail are stored in Exchange Online. Microsoft Teams client is a web-based desktop app, originally developed on top of the Electron framework which combines the Chromium rendering engine and the Node.js JavaScript platform. Version 2.0 client was rebuilt using the Evergreen version of Microsoft Edge WebView2 in place of Electron. == Features == === Chats === Teams allows users to communicate in two-way persistent chats with one or multiple participants. Participants can message using text, emojis, stickers and gifs, as well as sharing links and files. In August 2022, the chat feature was updated for "chat with yourself"; allowing for the organization of files, notes, comments, images, and videos within a private chat tab. === Teams === Teams allows communities, groups, or teams to contribute in a shared workspace where messages and digital content on a specific topic are shared. Team members can join through an invitation sent by a team administrator or owner or sharing of a specific URL. Teams for Education allows admins and teachers to set up groups for classes, professional learning communities (PLCs), staff members, and everyone. === Channels === Channels allow team members to communicate without the use of email or group SMS (texting). Users can reply to posts with text, images, GIFs, and image macros. Direct messages send private messages to designated users rather than the entire channel. Connectors can be used within a channel to submit information contacted through a third-party service. Connectors include Mailchimp, Facebook Pages, Twitter, Power BI and Bing News. === Group conversations === Ad-hoc groups can be created to share instant messaging, audio calls (VoIP), and video calls inside the client software. === Telephone replacement === A feature on one of the higher cost licencing tiers allows connectivity to the public switched telephone network (PSTN) telephone system. This allows users to use Teams as if it were a telephone, making and receiving calls over the PSTN, including the ability to host "conference calls" with multiple participants. === Meeting === Meetings can be scheduled with multiple participants able to share audio, video, chat and presented content with all participants. Multiple users can connect via a meeting link. Automated minutes are possible using the recording and transcript features. Teams has a plugin for Microsoft Outlook to schedule a Teams Meeting in Outlook for a specific date and time and invite others to attend. If a meeting is scheduled within a channel, users visiting the channel are able to see if a meeting is in progress. ==== Teams Live Events ==== Teams Live Events replaces Skype Meeting Broadcast for users to broadcast to 10,000 participants on Teams, Yammer, or Microsoft Stream. ==== Breakout Rooms ==== Breakout rooms split a meeting into small groups. This is often utilized for collaboration during trainings or any environment where having all participants speak at once could be disruptive or unfeasible. Breakout rooms can be set by the hosts to a certain length of time, after which all participants will automatically rejoin the main meeting room. ==== Front Row ==== Front Row adjusts the layout of the viewer's screen, placing the speaker or content in the center of the gallery with other meeting participant's video feeds reduced in size and located below the speaker. === Education === Microsoft Teams for Education allows teachers to distribute, provide feedback, and grade student assignments turned in via Teams using the Assignments tab through Office 365 for Education subscribers. Quizzes can also be assigned to students through an integration with Office Forms. === Protocols === Microsoft Teams is based on a number of Microsoft-specific protocols. Video conferences are realized over the protocol MNP24, known from the Skype consumer version. VoIP and video conference clients based on SIP and H.323 need special gateways to connect to Microsoft Teams servers. With the help of Interactive Connectivity Establishment (ICE), clients behind Network address translation routers and restrictive firewalls are also able to connect, if peer-to-peer is not possible. === Integrations === Microsoft Teams has integrations through Microsoft AppSource, its integration marketplace. In 2020, Microsoft partnered with KUDO, a cloud-based solution with language interpretation, to allow integrated language meeting controls. In June 2022, an update was released using AI to improve call audio through the elimination of background feedback loops and cancelling non-vocal audio. == Anti-trust controversy == In July 2023, the European Commission opened an anti-trust investigation into the possibility that Microsoft unfairly used its office suite market power to increase sales of Teams and hurt