AI For Business Microsoft

AI For Business Microsoft — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Fuse Mediation Router

    Fuse Mediation Router

    Fuse Mediation Router is an open source tool for integrating services using Enterprise Integration Patterns based on Apache Camel for use in enterprise IT organizations. It is certified, productized and fully supported by the people who wrote the code. Fuse Mediation Router uses a standard method of notation to go from diagram to implementation without coding. Fuse Mediation Router is a rule-based routing and process mediation engine that combines the ease of basic POJO development with the clarity of the standard Enterprise Integration Patterns. It can be deployed inside any container or be used stand-alone, and works directly with any kind of transport or messaging model to rapidly integrate existing services and applications. Fuse Mediation Router is now a part of Red Hat JBoss Fuse. == Tooling == FuseSource offers graphical, Eclipse-based tooling for Apache Camel for download.

    Read more →
  • Signals intelligence

    Signals intelligence

    Signals intelligence (SIGINT) is the act and field of intelligence-gathering by interception of signals, whether communications between people (communications intelligence—abbreviated to COMINT) or from electronic signals not directly used in communication (electronic intelligence—abbreviated to ELINT). As classified and sensitive information is usually encrypted, signals intelligence may necessarily involve cryptanalysis (to decipher the messages). Traffic analysis—the study of who is signaling to whom and in what quantity—is also used to integrate information, and it may complement cryptanalysis. == History == === Origins === Electronic interceptions appeared as early as 1900, during the Boer War of 1899–1902. The British Royal Navy had installed wireless sets produced by Marconi on board their ships in the late 1890s, and the British Army used some limited wireless signalling. The Boers captured some wireless sets and used them to make vital transmissions. Since the British were the only people transmitting at the time, the British did not need special interpretation of the signals that they were. The birth of signals intelligence in a modern sense dates from the Russo-Japanese War of 1904–1905. As the Russian fleet prepared for conflict with Japan in 1904, the British ship HMS Diana stationed in the Suez Canal intercepted Russian naval wireless signals being sent out for the mobilization of the fleet, for the first time in history. === Development in World War I === Over the course of the First World War, a new method of signals intelligence reached maturity. Russia's failure to properly protect its communications fatally compromised the Russian Army's advance early in World War I and led to their disastrous defeat by the Germans under Ludendorff and Hindenburg at the Battle of Tannenberg. In 1918, French intercept personnel captured a message written in the new ADFGVX cipher, which was cryptanalyzed by Georges Painvin. This gave the Allies advance warning of the German 1918 Spring Offensive. The British in particular, built up great expertise in the newly emerging field of signals intelligence and codebreaking (synonymous with cryptanalysis). On the declaration of war, Britain cut all German undersea cables. This forced the Germans to communicate exclusively via either (A) a telegraph line that connected through the British network and thus could be tapped; or (B) through radio which the British could then intercept. Rear Admiral Henry Oliver appointed Sir Alfred Ewing to establish an interception and decryption service at the Admiralty; Room 40. An interception service known as 'Y' service, together with the post office and Marconi stations, grew rapidly to the point where the British could intercept almost all official German messages. The German fleet was in the habit each day of wirelessing the exact position of each ship and giving regular position reports when at sea. It was possible to build up a precise picture of the normal operation of the High Seas Fleet, to infer from the routes they chose where defensive minefields had been placed and where it was safe for ships to operate. Whenever a change to the normal pattern was seen, it immediately signalled that some operation was about to take place, and a warning could be given. Detailed information about submarine movements was also available. The use of radio-receiving equipment to pinpoint the location of any single transmitter was also developed during the war. Captain H.J. Round, working for Marconi, began carrying out experiments with direction-finding radio equipment for the army in France in 1915. By May 1915, the Admiralty was able to track German submarines crossing the North Sea. Some of these stations also acted as 'Y' stations to collect German messages, but a new section was created within Room 40 to plot the positions of ships from the directional reports. Room 40 played an important role in several naval engagements during the war, notably in detecting major German sorties into the North Sea. The battle of Dogger Bank was won in no small part due to the intercepts that allowed the Navy to position its ships in the right place. It played a vital role in subsequent naval clashes, including at the Battle of Jutland as the British fleet was sent out to intercept them. The direction-finding capability allowed for the tracking and location of German ships, submarines, and Zeppelins. The system was so successful that by the end of the war, over 80 million words, comprising the totality of German wireless transmission over the course of the war, had been intercepted by the operators of the Y-stations and decrypted. However, its most astonishing success was in decrypting the Zimmermann Telegram, a telegram from the German Foreign Office sent via Washington to its ambassador Heinrich von Eckardt in Mexico. === Postwar consolidation === With the importance of interception and decryption firmly established by the wartime experience, countries established permanent agencies dedicated to this task in the interwar period. In 1919, the British Cabinet's Secret Service Committee, chaired by Lord Curzon, recommended that a peace-time codebreaking agency should be created. The Government Code and Cypher School (GC&CS) was the first peace-time codebreaking agency, with a public function "to advise as to the security of codes and cyphers used by all Government departments and to assist in their provision", but also with a secret directive to "study the methods of cypher communications used by foreign powers". GC&CS officially formed on 1 November 1919, and produced its first decrypt on 19 October. By 1940, GC&CS was working on the diplomatic codes and ciphers of 26 countries, tackling over 150 diplomatic cryptosystems. The US Cipher Bureau was established in 1919 and achieved some success at the Washington Naval Conference in 1921, through cryptanalysis by Herbert Yardley. Secretary of War Henry L. Stimson closed the US Cipher Bureau in 1929 with the words "Gentlemen do not read each other's mail." === World War II === The use of SIGINT had even greater implications during World War II. The combined effort of intercepts and cryptanalysis for the whole of the British forces in World War II came under the code name "Ultra", managed from Government Code and Cypher School at Bletchley Park. Properly used, the German Enigma and Lorenz ciphers should have been virtually unbreakable, but flaws in German cryptographic procedures, and poor discipline among the personnel carrying them out, created vulnerabilities which made Bletchley's attacks feasible. Bletchley's work was essential to defeating the U-boats in the Battle of the Atlantic, and to the British naval victories in the Battle of Cape Matapan and the Battle of North Cape. In 1941, Ultra exerted a powerful effect on the North African desert campaign against German forces under General Erwin Rommel. General Sir Claude Auchinleck wrote that were it not for Ultra, "Rommel would have certainly got through to Cairo". Ultra decrypts featured prominently in the story of Operation SALAM, László Almásy's mission across the desert behind Allied lines in 1942. Prior to the Normandy landings on D-Day in June 1944, the Allies knew the locations of all but two of Germany's fifty-eight Western Front divisions. Winston Churchill was reported to have told King George VI: "It is thanks to the secret weapon of General Menzies, put into use on all the fronts, that we won the war!" Supreme Allied Commander, Dwight D. Eisenhower, at the end of the war, described Ultra as having been "decisive" to Allied victory. Official historian of British Intelligence in World War II Sir Harry Hinsley argued that Ultra shortened the war "by not less than two years and probably by four years"; and that, in the absence of Ultra, it is uncertain how the war would have ended. At a lower level, German cryptanalysis, direction finding, and traffic analysis were vital to Rommel's early successes in the Western Desert Campaign until British forces tightened their communications discipline and Australian raiders destroyed his principal SIGINT Company. == Technical definitions == The United States Department of Defense has defined the term "signals intelligence" as: A category of intelligence comprising either individually or in combination all communications intelligence (COMINT), electronic intelligence (ELINT), and foreign instrumentation signals intelligence (FISINT), however transmitted. Intelligence derived from communications, electronic, and foreign instrumentation signals. Being a broad field, SIGINT has many sub-disciplines. The two main ones are communications intelligence (COMINT) and electronic intelligence (ELINT). == Disciplines shared across the branches == === Targeting === A collection system has to know to look for a particular signal. "System", in this context, has several nuances. Targeting is the process of developing collection requirements: "1. A

    Read more →
  • Content engineering

    Content engineering

    Content engineering is a term applied to an engineering specialty dealing with the complexities around the use of content in computer-facilitated environments. Content authoring and production, content management, content modeling, content conversion, and content use and repurposing are all areas involving this practice. It is not a specialty with wide industry recognition and is often performed on an ad hoc basis by members of software development or content production or marketing staff, but is beginning to be recognized as a necessary function in any complex content-centric project involving both content production as well as software system development mainly involving content management systems (CMS) or digital experience platforms (DXP). Content engineering tends to bridge the gap between groups involved in the production of content (publishing and editorial staff, marketing, sales, human resources) and more technologically oriented departments such as software development, or IT that put this content to use in web or other software-based environments, and requires an understanding of the issues and processes of both sides. Typically, content engineering involves extensive use of embedded XML technologies, XML being the most widespread language for representing structured content. Content management systems are a key technology often used in the practice of content engineering. == Definition == Content engineering is the practice of organizing the shape and structure of content by deploying content and metadata models, in authoring and publishing processes in a manner that meets the requirements of an organization's Content Strategy, and its implementation through the use of technology such as CMS, XML, schema markup, artificial intelligence, APIs and others. == Purpose and goal == In very general terms, content engineering practices aim to maximize the ROI of content through content reuse and improving efficiency of content marketing, content operations, content strategy. Content engineering can help address content challenges that fairly typical organizations face: Siloed content supply chains Duplicate content in a myriad of formats Inefficient content authoring workflows Chunky, unstructured content Outdated technology Technology in place does not match needs Inability to reuse content across channels (multi-channel content) Metadata and schema are not used Lack of standards for metadata Lack of findability of content for internal and external use Poor SEO performance Inability to implement personalization == Key skills == Content engineering draws on a combination of technical, strategic, and editorial competencies. Practitioners typically require proficiency across several domains: === Content modeling and information architecture === Content engineers design structured content models that define how content is created, stored, and distributed. This includes building taxonomies, ontologies, and metadata schemas that enable content reuse across channels and platforms. === Structured content and markup languages === Proficiency in XML, JSON, HTML, and schema.org markup is fundamental. Content engineers use these languages to structure content for machine readability, search engine optimization, and interoperability between systems. === Content management systems and platforms === Content engineers require working knowledge of content management systems (CMS), digital experience platforms (DXP), and headless CMS architectures. This includes configuring content types, workflows, and publishing pipelines within these systems. === Workflow design and automation === Designing and implementing content workflows - from authoring through review, approval, and distribution - is a core function. Increasingly, this involves configuring AI-assisted and agentic workflows that automate research, drafting, repurposing, and distribution tasks at scale. === Content strategy and editorial understanding === Unlike purely technical roles, content engineering requires a working understanding of content strategy, brand management, editorial standards, and audience analysis. Content engineers must translate strategic objectives into technical content structures and system configurations. === API integration and data interoperability === Content engineers work with APIs to connect content systems, analytics platforms, distribution channels, and third-party services. Understanding how content flows between systems is essential for enabling multi-channel publishing and content personalization. === Analytics and performance measurement === Measuring content effectiveness through web analytics, SEO performance data, and engagement metrics informs how content engineers refine structures, metadata, and distribution workflows. == The role of a content engineer == Content engineers bridge the divide between content strategists and producers and the developers and content managers who publish and distribute content. But rather than simply wedging themselves between these players, content engineers help define and facilitate the content structure during the entire content strategy, production and distribution cycle from beginning to end. As the role has evolved, content engineers are increasingly expected to build and manage AI-powered content systems, moving beyond traditional CMS configuration into agentic workflows that automate content research, production, and distribution. By integrating skills in business and technology, content engineers do not see content as static or finished. Rather, they look at the value of the content and how it can best be adapted and personalized to serve customers and emerging content platforms, technologies, and opportunities. === Create customer experience === Content marketing suffers from two fundamental limitations that constrain the true power and potential that a great content marketing plan can bring to a business' bottom line: Content relevance: how to make content more relevant and personalized to their audiences. The marketer and content strategist direct the customer experience itself, and the content engineer makes it happen with content structure, schema, metadata, microdata, taxonomy, and CMS topology. Content agility: Marketers who are burdened with one-size-fits-all content remain stuck managing their content rather than their customers' experience. Content engineers give marketers the "super powers" to move content-powered experiences across interfaces and personalization variants. === Break down barriers === Empower content strategists: Content engineers work with content strategists by helping them connect content not as a fixed message, but as a modular construct which can be channeled and manipulated. Enable content producers: A content engineer will work with a content producer by helping to find new sources of content and ways the content can be combined and presented. Guide and free developers: The content engineer helps translate marketing strategy into clear technical needs and functions developers can build into content management systems Enhance content management: Develop content structures that make it easier for content writers and content managers to author to a single, very usable, interface for even complex content types that might contain dozens of elements. Engineer content for success: Content engineers help all members of a marketing team work more smoothly, with the support and structures needed to get the most out of the content they produce. === Salary benchmarks === Content engineering roles command significantly higher salaries than traditional content marketing positions. In the United States, IC-level content engineers earn between $120,000 and $165,000 annually, while senior roles reach $160,000 to $220,000. Head of content engineering positions range from $200,000 to $280,000, and VP-level roles can exceed $375,000. The emergence of dedicated content engineer job postings from companies such as Exit Five reflects the growing recognition of the role as a distinct function within marketing organizations.

    Read more →
  • Open Data-Link Interface

    Open Data-Link Interface

    The Open Data-Link Interface (ODI) is an application programming interface (API) for network interface controllers (NICs) developed by Apple and Novell. The API serves the same function as Microsoft and 3COM's Network Driver Interface Specification (NDIS). Originally, ODI was written for NetWare and Macintosh environments. Like NDIS, ODI provides rules that establish a vendor-neutral interface between the protocol stack and the adapter driver. It resides in Layer 2, the Data Link layer, of the OSI model. This interface also enables one or more network drivers to support one or more protocol stacks.

    Read more →
  • MyPertamina

    MyPertamina

    MyPertamina is a digital financial service platform from Pertamina that integrated with the apps LinkAja. This application is used for non-cash fuel oil payments at Pertamina's public fueling stations. == History == Originally, MyPertamina were merchandise outlets of Pertamina products. It was launched on December 21, 2016, with 3 outlets in Jakarta. MyPertamina sells clothes, hats, and other products with Pertamina products brands. One month later (January 2017), Pertamina and Bank Mandiri entered into a partnership to launch the Mandiri Credit Card Pertamina Mastercard product, so that consumers can make payments when users fill up fuel at Pertamina gas stations. In August 2017, MyPertamina app and electronic card were launched through MyPertamina Loyalty program at Gaikindo Indonesia International Auto Show 2017. The card can be used on EDC machines for non-cash payments. Initial balances are in its own app, that can be top up by ATMs and online banking.

    Read more →
  • Data thinking

    Data thinking

    Data Thinking is a framework that integrates data science with the design process. It combines computational thinking, statistical thinking, and domain-specific knowledge to guide the development of data-driven solutions in product development. The framework is used to explore, design, develop, and validate solutions, with a focus on user experience and data analytics, including data collection and interpretation The framework aims to apply data literacy and inform decision-making through data-driven insights. == Major components == According to "Computational thinking in the era of data science": Data thinking involves understanding that solutions require both data-driven and domain-knowledge-driven rules. Data thinking evaluates whether data accurately represents real-life scenarios and improves data collection where necessary. The framework highlights the importance of preserving domain-specific meaning during data analysis. Data thinking incorporates statistical and logical analysis to identify patterns and irregularities. Data thinking involves testing solutions in real-life contexts and iteratively improving models based on new data. The process requires evaluating problems from multiple abstraction levels and understanding the potential for biases in generalizations. == Major phases == === Strategic context and risk analysis === Analyzing the broader digital strategy and assessing risks and opportunities is a common step before beginning a project. Techniques like coolhunting, trend analysis, and scenario planning can be used to assist with this. === Ideation and exploration === In this phase, focus areas are identified, and use cases are developed by integrating organizational goals, user needs, and data requirements. Design thinking methods, such as personas and customer journey mapping, are applied. === Prototyping === A proof of concept is created to test feasibility and refine solutions through iterative evaluation to optimize for effective performance. === Implementation and monitoring === Solutions are tested and monitored for performance and continual improvement. == Implementing Data Thinking == The following resources explain more about data thinking and its applications: "Data Thinking: Framework for data-based solutions" by StackFuel "What is Data Thinking? A modern approach to designing a data strategy" by Mantel Group "Data Science Thinking" by SpringerLink These sources provide detailed insights into the methodology, phases, and benefits of adopting Data Thinking in organizational processes.

    Read more →
  • Social profiling

    Social profiling

    Social profiling is the process of constructing a social media user's profile using their social data. In general, profiling refers to the data science process of generating a person's profile with computerized algorithms and technology. There are various platforms for sharing this information with the proliferation of growing popular social networks, including but not limited to LinkedIn, Google+, Facebook and Twitter. == Social profile and social data == A person's social data refers to the personal data that they generate either online or offline (for more information, see social data revolution). A large amount of these data, including one's language, location and interest, is shared through social media and social network. Users join multiple social media platforms and their profiles across these platforms can be linked using different methods to obtain their interests, locations, content, and friend list. Altogether, this information can be used to construct a person's social profile. Meeting the user's satisfaction level for information collection is becoming more challenging. This is because of too much "noise" generated, which affects the process of information collection due to explosively increasing online data. Social profiling is an emerging approach to overcome the challenges faced in meeting user's demands by introducing the concept of personalized search while keeping in consideration user profiles generated using social network data. A study reviews and classifies research inferring users social profile attributes from social media data as individual and group profiling. The existing techniques along with utilized data sources, the limitations, and challenges were highlighted. The prominent approaches adopted include machine learning, ontology, and fuzzy logic. Social media data from Twitter and Facebook have been used by most of the studies to infer the social attributes of users. The literature showed that user social attributes, including age, gender, home location, wellness, emotion, opinion, relation, influence are still need to be explored. === Personalized meta-search engines === The ever-increasing online content has resulted in the lack of proficiency of centralized search engine's results. It can no longer satisfy user's demand for information. A possible solution that would increase coverage of search results would be meta-search engines, an approach that collects information from numerous centralized search engines. A new problem thus emerges, that is too much data and too much noise is generated in the collection process. Therefore, a new technique called personalized meta-search engines was developed. It makes use of a user's profile (largely social profile) to filter the search results. A user's profile can be a combination of a number of things, including but not limited to, "a user's manual selected interests, user's search history", and personal social network data. == Social media profiling == According to Samuel D. Warren II and Louis Brandeis (1890), disclosure of private information and the misuse of it can hurt people's feelings and cause considerable damage in people's lives. Social networks provide people access to intimate online interactions; therefore, information access control, information transactions, privacy issues, connections and relationships on social media have become important research fields and are subjects of concern to the public. Ricard Fogues and other co-authors state that "any privacy mechanism has at its base an access control", that dictate "how permissions are given, what elements can be private, how access rules are defined, and so on". Current access control for social media accounts tend to still be very simplistic: there is very limited diversity in the category of relationships on for social network accounts. User's relationships to others are, on most platforms, only categorized as "friend" or "non-friend" and people may leak important information to "friends" inside their social circle but not necessarily users to they consciously want to share the information to. The below section is concerned with social media profiling and what profiling information on social media accounts can achieve. === Privacy leaks === A lot of information is voluntarily shared on online social networks, such as photos and updates on life activities (new job, hobbies, etc.). People rest assured that different social network accounts on different platforms will not be linked as long as they do not grant permission to these links. However, according to Diane Gan, information gathered online enables "target subjects to be identified on other social networking sites such as Foursquare, Instagram, LinkedIn, Facebook and Google+, where more personal information was leaked". The majority of social networking platforms use the "opt out approach" for their features. If users wish to protect their privacy, it is user's own responsibility to check and change the privacy settings as a number of them are set to default option. A major social network platforms have developed geo-tag functions and are in popular usage. This is concerning because 39% of users have experienced profiling hacking; 78% burglars have used major social media networks and Google Street-view to select their victims; and an astonishing 54% of burglars attempted to break into empty houses when people posted their status updates and geo-locations. === Facebook === Formation and maintenance of social media accounts and their relationships with other accounts are associated with various social outcomes. In 2015, for many firms, customer relationship management is essential and is partially done through Facebook. Before the emergence and prevalence of social media, customer identification was primarily based upon information that a firm could directly acquire: for example, it may be through a customer's purchasing process or voluntary act of completing a survey/loyalty program. However, the rise of social media has greatly reduced the approach of building a customer's profile/model based on available data. Marketers now increasingly seek customer information through Facebook; this may include a variety of information users disclose to all users or partial users on Facebook: name, gender, date of birth, e-mail address, sexual orientation, marital status, interests, hobbies, favorite sports team(s), favorite athlete(s), or favorite music, and more importantly, Facebook connections. However, due to the privacy policy design, acquiring true information on Facebook is no trivial task. Often, Facebook users either refuse to disclose true information (sometimes using pseudonyms) or setting information to be only visible to friends, Facebook users who "LIKE" your page are also hard to identify. To do online profiling of users and cluster users, marketers and companies can and will access the following kinds of data: gender, the IP address and city of each user through the Facebook Insight page, who "LIKED" a certain user, a page list of all the pages that a person "LIKED" (transaction data), other people that a user follow (even if it exceeds the first 500, which we usually can not see) and all the publicly shared data. === Twitter === First launched on the Internet in March 2006, Twitter is a platform on which users can connect and communicate with any other user in just 280 characters. Like Facebook, Twitter is also a crucial tunnel for users to leak important information, often unconsciously, but able to be accessed and collected by others. According to Rachel Nuwer, in a sample of 10.8 million tweets by more than 5,000 users, their posted and publicly shared information are enough to reveal a user's income range. A postdoctoral researcher from the University of Pennsylvania, Daniel Preoţiuc-Pietro and his colleagues were able to categorize 90% of users into corresponding income groups. Their existing collected data, after being fed into a machine-learning model, generated reliable predictions on the characteristics of each income group. The mobile app called Streamd.in displays live tweets on Google Maps by using geo-location details attached to the tweet, and traces the user's movement in the real world. === Profiling photos on social network === The advent and universality of social media networks have boosted the role of images and visual information dissemination. Many types of visual information on social media transmit messages from the author, location information and other personal information. For example, a user may post a photo of themselves in which landmarks are visible, which can enable other users to determine where they are. In a study done by Cristina Segalin, Dong Seon Cheng and Marco Cristani, they found that profiling user posts' photos can reveal personal traits such as personality and mood. In the study, convolutional neural networks (CNNs) is introduced. It builds on the main characteristics of computational

    Read more →
  • Sysomos

    Sysomos

    Sysomos Inc. is a Toronto-based social media analytics company owned by Outside Insight market leaders Meltwater. The company developed text analytics and machine learning technologies for user generated content, and served 80% of the top agencies and Fortune 500. == History == Sysomos was founded by Nilesh Bansal and Nick Koudas. The company is a spinoff of the University of Toronto research project BlogScope. The BlogScope project, which started in 2005, resulted in creation of the underlying content aggregation and analysis engine commercialized by Sysomos. The company raised venture capital in 2008 and was acquired by Marketwire in 2010. The company's original flagship product, Media Analysis Platform (MAP), mines and analyzes content from social media or user-generated content to create a picture of media coverage. Sysomos launched its flagship offering MAP in Sept 2007, followed by addition of Heartbeat to its product suite in 2009. In addition to the two main products, the company released FourWhere, a free location-based social search service that mashes up Foursquare in March 2010. The company also offers Sysomos Heartbeat which provides social media monitoring and engagement capabilities to communication professionals, brand managers and customer support groups. In 2013, Heartbeat was extended to add publishing components to deliver a complete end-to-end social media marketing platform. On July 6, 2010, it was announced that Marketwire, a press release distribution company, had acquired Sysomos. After the acquisition, Sysomos founders Nick Koudas and Nilesh Bansal, left Sysomos to start Aislelabs. In February 2015, Sysomos split from Marketwired, as an independent company, and appointed Adnan Ahmed as the new CEO. In March 2015, newly independent Sysomos launched a redesign for its Heartbeat product and a new API for its MAP product. In the same year, the company acquired Expion. In September 2016, Peter Heffring was announced as the new CEO. In April 2017, Sysomos showcased a new unified platform offering new insights. In April 2018, media monitoring firm Meltwater announced it had acquired Sysomos. The CEO of Sysomos, Peter Heffring, said the company will continue to operate as an independent unit of Meltwater. Heffring will run the social analytics division of Meltwater. == Reports == Inside Twitter series of reports is the most extensive third-party survey on Twitter's growth and demographics. Another extensive survey regarding the top 5% of most active Twitter users found that over 25% of all tweets are machine created. The report also confirms Twitter's international growth. Inside Facebook Pages report found that only four percent of pages have more than 10,000 fans, 0.76% of pages have more than 100,000 fans, and 0.05% of pages (or 297 in total) have more than a million fans. Inside YouTube reports focus more on video hosting services and YouTube.

    Read more →
  • Weak artificial intelligence

    Weak artificial intelligence

    Weak artificial intelligence (weak AI) is artificial intelligence that implements a limited part of the mind, or, as narrow AI, artificial narrow intelligence (ANI), is focused on one narrow task. Weak AI is contrasted with strong AI, which can be interpreted in various ways: Artificial general intelligence (AGI): a machine with the ability to apply intelligence to any problem, rather than just one specific problem. Artificial superintelligence (ASI): a machine with a vastly superior intelligence to the average human being. Artificial consciousness: a machine that has consciousness, sentience and mind (John Searle uses "strong AI" in this sense). Narrow AI can be classified as being "limited to a single, narrowly defined task. Most modern AI systems would be classified in this category." Artificial general intelligence is conversely the opposite. == Applications and risks == Some examples of narrow AI are AlphaGo, self-driving cars, robot systems used in the medical field, and diagnostic doctors. Narrow AI systems are sometimes dangerous if unreliable. And the behavior that it follows can become inconsistent. It could be difficult for the AI to grasp complex patterns and get to a solution that works reliably in various environments. This "brittleness" can cause it to fail in unpredictable ways. Narrow AI failures can sometimes have significant consequences. It could for example cause disruptions in the electric grid, damage nuclear power plants, cause global economic problems, and misdirect autonomous vehicles. Medicines could be incorrectly sorted and distributed. Also, medical diagnoses can ultimately have serious and sometimes deadly consequences if the AI is faulty or biased. Simple AI programs have already worked their way into society, oftentimes unnoticed by the public. Autocorrection for typing, speech recognition for speech-to-text programs, and vast expansions in the data science fields are examples. Narrow AI has also been the subject of some controversy, including resulting in unfair prison sentences, discrimination against women in the workplace for hiring, resulting in death via autonomous driving, among other cases. Despite being "narrow" AI, recommender systems are efficient at predicting user reactions based on their posts, patterns, or trends. For instance, TikTok's "For You" algorithm can determine a user's interests or preferences in less than an hour. Some other social media AI systems are used to detect bots that may be involved in propaganda or other potentially malicious activities. == Weak AI versus strong AI == John Searle contests the possibility of strong AI (by which he means conscious AI). He further believes that the Turing test (created by Alan Turing and originally called the "imitation game", used to assess whether a machine can converse indistinguishably from a human) is not accurate or appropriate for testing whether an AI is "strong". Scholars such as Antonio Lieto have argued that the current research on both AI and cognitive modelling are perfectly aligned with the weak-AI hypothesis (that should not be confused with the "general" vs "narrow" AI distinction) and that the popular assumption that cognitively inspired AI systems espouse the strong AI hypothesis is ill-posed and problematic since "artificial models of brain and mind can be used to understand mental phenomena without pretending that that they are the real phenomena that they are modelling" (as, on the other hand, implied by the strong AI assumption).

    Read more →
  • Format-preserving encryption

    Format-preserving encryption

    In cryptography, format-preserving encryption (FPE), refers to encrypting in such a way that the output (the ciphertext) is in the same format as the input (the plaintext). The meaning of "format" varies. Typically only finite sets of characters are used; numeric, alphabetic or alphanumeric. For example: Encrypting a 16-digit credit card number so that the ciphertext is another 16-digit number. Encrypting an English word so that the ciphertext is another English word. Encrypting an n-bit number so that the ciphertext is another n-bit number (this is the definition of an n-bit block cipher). For such finite domains, and for the purposes of the discussion below, the cipher is equivalent to a permutation of N integers {0, ... , N−1} where N is the size of the domain. == Motivation == === Restricted field lengths or formats === One motivation for using FPE comes from the problems associated with integrating encryption into existing applications, with well-defined data models. A typical example would be a credit card number, such as 1234567812345670 (16 bytes long, digits only). Adding encryption to such applications might be challenging if data models are to be changed, as it usually involves changing field length limits or data types. For example, output from a typical block cipher would turn credit card number into a hexadecimal (e.g.0x96a45cbcf9c2a9425cde9e274948cb67, 34 bytes, hexadecimal digits) or Base64 value (e.g. lqRcvPnCqUJc3p4nSUjLZw==, 24 bytes, alphanumeric and special characters), which will break any existing applications expecting the credit card number to be a 16-digit number. Apart from simple formatting problems, using AES-128-CBC, this credit card number might get encrypted to the hexadecimal value 0xde015724b081ea7003de4593d792fd8b695b39e095c98f3a220ff43522a2df02. In addition to the problems caused by creating invalid characters and increasing the size of the data, data encrypted using the CBC mode of an encryption algorithm also changes its value when it is decrypted and encrypted again. This happens because the random seed value that is used to initialize the encryption algorithm and is included as part of the encrypted value is different for each encryption operation. Because of this, it is impossible to use data that has been encrypted with the CBC mode as a unique key to identify a row in a database. FPE attempts to simplify the transition process by preserving the formatting and length of the original data, allowing a drop-in replacement of plaintext values with their ciphertexts in legacy applications. == Comparison to truly random permutations == Although a truly random permutation is the ideal FPE cipher, for large domains it is infeasible to pre-generate and remember a truly random permutation. So the problem of FPE is to generate a pseudorandom permutation from a secret key, in such a way that the computation time for a single value is small (ideally constant, but most importantly smaller than O(N)). == Comparison to block ciphers == An n-bit block cipher technically is a FPE on the set {0, ..., 2n-1}. If an FPE is needed on one of these standard sized sets (for example, n = 64 for DES and n = 128 for AES) a block cipher of the right size can be used. However, in typical usage, a block cipher is used in a mode of operation that allows it to encrypt arbitrarily long messages, and with an initialization vector as discussed above. In this mode, a block cipher is not an FPE. == Definition of security == In cryptographic literature (see most of the references below), the measure of a "good" FPE is whether an attacker can distinguish the FPE from a truly random permutation. Various types of attackers are postulated, depending on whether they have access to oracles or known ciphertext/plaintext pairs. == Algorithms == In most of the approaches listed here, a well-understood block cipher (such as AES) is used as a primitive to take the place of an ideal random function. This has the advantage that incorporation of a secret key into the algorithm is easy. Where AES is mentioned in the following discussion, any other good block cipher would work as well. === The FPE constructions of Black and Rogaway === Implementing FPE with security provably related to that of the underlying block cipher was first undertaken in a paper by cryptographers John Black and Phillip Rogaway, which described three ways to do this. They proved that each of these techniques is as secure as the block cipher that is used to construct it. This means that if the AES algorithm is used to create an FPE algorithm, then the resulting FPE algorithm is as secure as AES because an adversary capable of defeating the FPE algorithm can also defeat the AES algorithm. Therefore, if AES is secure, then the FPE algorithms constructed from it are also secure. In all of the following, E denotes the AES encryption operation that is used to construct an FPE algorithm and F denotes the FPE encryption operation. ==== FPE from a prefix cipher ==== One simple way to create an FPE algorithm on {0, ..., N-1} is to assign a pseudorandom weight to each integer, then sort by weight. The weights are defined by applying an existing block cipher to each integer. Black and Rogaway call this technique a "prefix cipher" and showed it was provably as good as the block cipher used. Thus, to create an FPE on the domain {0,1,2,3}, given a key K apply AES(K) to each integer, giving, for example, weight(0) = 0x56c644080098fc5570f2b329323dbf62 weight(1) = 0x08ee98c0d05e3dad3eb3d6236f23e7b7 weight(2) = 0x47d2e1bf72264fa01fb274465e56ba20 weight(3) = 0x077de40941c93774857961a8a772650d Sorting [0,1,2,3] by weight gives [3,1,2,0], so the cipher is F(0) = 3 F(1) = 1 F(2) = 2 F(3) = 0 This method is only useful for small values of N. For larger values, the size of the lookup table and the required number of encryptions to initialize the table gets too big to be practical. ==== FPE from cycle walking ==== If there is a set M of allowed values within the domain of a pseudorandom permutation P (for example P can be a block cipher like AES), an FPE algorithm can be created from the block cipher by repeatedly applying the block cipher until the result is one of the allowed values (within M). CycleWalkingFPE(x) { if P(x) is an element of M then return P(x) else return CycleWalkingFPE(P(x)) } The recursion is guaranteed to terminate. (Because P is one-to-one and the domain is finite, repeated application of P forms a cycle, so starting with a point in M the cycle will eventually terminate in M.) This has the advantage that the elements of M do not have to be mapped to a consecutive sequence {0,...,N-1} of integers. It has the disadvantage, when M is much smaller than P's domain, that too many iterations might be required for each operation. If P is a block cipher of a fixed size, such as AES, this is a severe restriction on the sizes of M for which this method is efficient. For example, an application may want to encrypt 100-bit values with AES in a way that creates another 100-bit value. With this technique, AES-128-ECB encryption can be applied until it reaches a value which has all of its 28 highest bits set to 0, which will take an average of 228 iterations to happen. ==== FPE from a Feistel network ==== It is also possible to make a FPE algorithm using a Feistel network. A Feistel network needs a source of pseudo-random values for the sub-keys for each round, and the output of the AES algorithm can be used as these pseudo-random values. When this is done, the resulting Feistel construction is good if enough rounds are used. One way to implement an FPE algorithm using AES and a Feistel network is to use as many bits of AES output as are needed to equal the length of the left or right halves of the Feistel network. If a 24-bit value is needed as a sub-key, for example, it is possible to use the lowest 24 bits of the output of AES for this value. This may not result in the output of the Feistel network preserving the format of the input, but it is possible to iterate the Feistel network in the same way that the cycle-walking technique does to ensure that format can be preserved. Because it is possible to adjust the size of the inputs to a Feistel network, it is possible to make it very likely that this iteration ends very quickly on average. In the case of credit card numbers, for example, there are 1015 possible 16-digit credit card numbers (accounting for the redundant check digit), and because the 1015 ≈ 249.8, using a 50-bit wide Feistel network along with cycle walking will create an FPE algorithm that encrypts fairly quickly on average. === The Thorp shuffle === A Thorp shuffle is like an idealized card-shuffle, or equivalently a maximally-unbalanced Feistel cipher where one side is a single bit. It is easier to prove security for unbalanced Feistel ciphers than for balanced ones. === VIL mode === For domain sizes that are a power of two, and an existing block cipher with a smaller bl

    Read more →
  • Philco computers

    Philco computers

    Philco was one of the pioneers of transistorized computers, also known as second-generation computers. After the company developed the surface-barrier transistor, which was much faster than previous point-contact types, it was awarded contracts for military and government computers. Commercialized derivatives of some of these designs became successful business and scientific computers. The TRANSAC (Transistor Automatic Computer) Model S-1000 was released as a scientific computer. The TRANSAC S-2000 mainframe computer system was first produced in 1958, and a family of compatible machines, with increasing performance, was released over the next several years. However, the mainframe computer market was dominated by IBM. Other companies could not deploy resources for development, customer support and marketing on the scale that IBM could afford, making competition in this segment difficult after the introduction of the IBM 360 family. Philco went bankrupt and was purchased in 1961 by Ford Motor Company, but the computer division carried on until the Philco division of Ford exited the computer business in 1963. The Ford company maintained one Philco mainframe in use until 1981. == The surface-barrier transistor == The surface-barrier transistor developed by Philco in 1953 had a much higher frequency response than the original point-contact transistors. The transistor was made of a thin crystal of germanium, which was electrolytically etched with pits on either side forming a very thin base region, on the order of 5 micrometers. Philco's process for etching was United States patent number 2,885,571. Philco surface-barrier transistors were used in TX-0, and in early models of what would become the DEC PDP product line. Although relatively fast, the small size of the devices limited their power to circuits operating at a few tens of milliwatts. == Military and government == Between 1955 and 1957, Philco built transistor computers for use in aircraft, models C-1000, C-1100, and C-1102, intended for airborne real-time applications. By 1957, the C-1102 had been used by a civilian sector customer. The BASICPAC AN/TYK 6V (first delivery in 1961), COMPAC AN/TYK 4V (not completed), and LOGICPAC systems were built for the US Army as transportable computer systems for use with their Fieldata concept of integrated information management. BASICPAC was a transistorized computer with up to 28,672 words of 38-bit core memory (including sign and parity), available in several configurations from a minimum system, to a truck-borne mobile version, to a fully expanded system. Basic clock periods was 1 microsecond (which gives a clock rate of 1 MHz), with 12 microsecond memory access and a fixed-point multiplication taking 242 microseconds. Input/output was by paper tape reader and punch, or through a teletypewriter. With additional hardware, magnetic tape storage was also available, with up to seven I/O devices. The instruction set had 31 basic operation codes and nine opcodes for I/O === CXPQ === Philco was contracted by the US Navy to build the CXPQ computer. One model was completed and installed at the David Taylor Model Basin. This design was later adapted to become the commercial TRANSAC S-2000. Only one CXPQ was built. The CXPQ is a 48-bit transistorized computer. === SOLO === In 1955, the National Security Agency through the US Navy contracted with Philco to produce a computer suitable for use as a workstation, with an architecture based on the vacuum-tube computer system called Atlas II already in use at the NSA, and similar to the commercial UNIVAC 1103. At the time, Philco was the largest producer of surface barrier transistors, which were the only type available with the speed and quantities required for a computer. The SOLO prototype was delivered in 1958, but required extensive debugging at NSA. Difficulties were encountered with core memory and power supplies. SOLO used paper tape and teleprinter machines for input and output. SOLO cost about $1 million US, and contained 8,000 transistors. While the system was extensively used for training, testing, research and development, no additional units were ordered. SOLO was removed from active service in 1963. The design of the SOLO became commercialized as Philco's TRANSAC Model S-1000. == Commercial == === S-1000 === The TRANSAC S-1000 was a scientific computer with a 36-bit word length and 4096 words of core memory. It was packaged in a container about the size of a large office desk, and used only 1.2 kilowatts, much less than vacuum-tube-based computers of similar capacity. In a 1961 survey, about 15 S-1000 computer installations had been identified. It weighed about 1,650 pounds (750 kg). === S-2000 === The TRANSAC S-2000 was a large mainframe system intended for both business and scientific work. It had a 48-bit word length and supported calculations in fixed point, floating point and binary-coded decimal formats. The original S-2000 "TRANSAC" (Transistor Automatic Computer) released in 1958 was later designated Model 210; it was used internally at Philco. Similar to the Control Data Corporation Model 1604, it was a 48-bit fully transistorized computer. Three succeeding models were released in the series, all compatible with the software of the original model. The Model 211 was introduced in 1960, using micro-alloy diffused field-effect transistors, requiring significant redesign of circuits compared to the original. The TRANSAC S-2000/Philco 210/211 weighed about 2,000 pounds (910 kg). By 1964, eighteen Model 210, eighteen Model 211 and seven Model 212 systems had been sold. After Philco was purchased by Ford Motor Company, the Model 212 was introduced in 1962 and released in 1963. It had 65,535 words of 48-bit memory. Initially made with 6-microsecond core memory, it had better performance than the IBM 7094 transistor computer. It was later upgraded in 1964 to 2-microsecond core memory, which gave the machine floating-point performance greater than the IBM 7030 Stretch computer. A Model 213 was announced in 1964 but never built. By that time competition from IBM had made the Philco computer operations no longer profitable for Ford, and the division was closed down. The Model 212 could carry out a floating-point multiplication in 22 microseconds. Each word contained two 24-bit instructions with 16 bits of address information and eight bits for the opcode. There were 225 different valid opcodes in the Model 212; invalid opcodes were detected and halted the machine. The CPU had an accumulator register of 48 bits, three general-purpose registers of 24 bits, and 32 index registers of 15 bits. Main memory size ranged from 4K words to 64K words. Only the first model had a magnetic drum memory; later editions used tape drives. The Model 212 weighed about 6,500 pounds (3.3 short tons; 2.9 t). Software for the S-2000 initially consisted of TAC (Translator-Assembler-Compiler), and ALTAC, a FORTRAN II-like language with some differences from the IBM 704 FORTRAN implementation. A COBOL compiler was also available, targeted at business applications. The Philco 2400 was the input/output system for the S-2000. Operations such as reading cards or printing were carried out through magnetic tapes, thereby offloading the S-2000 from relatively slow input/output processing. The 2400 had a 24-bit word length and could be supplied with 4K to 32K characters (1K to 8K words) of core memory, rated at 3-microsecond cycle time. The instruction set was aimed at character I/O use. The idea of base registers, implemented in Philco computers, influenced the design of IBM/360. The last Philco TRANSAC S-2000 Model 212 was taken out of service in December 1981, after 19 years of service at Ford.

    Read more →
  • Squeaky Dolphin

    Squeaky Dolphin

    Squeaky Dolphin is a program developed by the Government Communications Headquarters (GCHQ), a British intelligence and security organization, to collect and analyze data from social media networks. The program was first revealed to the general public on NBC on 27 January 2014 based on documents previously leaked by Edward Snowden. == Scope of surveillance == According to a document of the GCHQ dated August 2012, the program enables broad, real-time surveillance of the following items: YouTube video views The Like button on Facebook. Facebook has since then encrypted the data. Blogspot/Blogger visits Twitter, which has however encrypted its communications since this presentation was made The program can be supplemented with commercially available analytic software to determine which videos are popular among residents of specific cities. The dashboard software chosen was made by Splunk. The presentation, which was originally shown to an NSA audience and was made public by the NBC, contains a note saying the program was "Not interested in individuals just broad trends!". However, "according to other Snowden documents" obtained by NBC, in 2010, "GCHQ exploited unencrypted data from Twitter to identify specific users around the world and target them with propaganda."

    Read more →
  • Wispr

    Wispr

    Wispr AI is a software company founded in 2021 by Tanay Kothari and Sahaj Garg that develops voice-based interfaces for computers and other devices. The company’s main product, Wispr Flow, is an AI-powered speech-to-text application available on macOS, Windows and iOS. == History == Wispr was founded in 2021 with the goal of building a non-invasive wearable device that would allow users to control smartphones without touch input. The device was intended to translate neurological signals into actions and to enable silent text entry by mouthing words, drawing on techniques similar to brain–computer interfaces. Early funding was directed toward this hardware-focused effort. After around three years of development, Wispr concluded that contemporary AI systems were not sufficient for the requirements of the wearable device. The company shifted its focus to Flow voice dictation software, the software layer originally built for the wearable, and in 2024 released a macOS application based on this platform. == Wispr Flow == Wispr Flow (often referred to as Flow) is a speech-to-text application for macOS, Windows and iOS. It provides real-time dictation and transcription in more than 100 languages and can operate across applications, including email clients, messaging platforms and chatbots. In June 2025 Wispr released an iOS version that functions as a third-party keyboard, allowing voice input in any app. == Technology == Wispr Flow is based on automatic speech recognition (ASR) and other AI models. The system adapts to individual users over time, learning their vocabulary and preferred style with the aim of reducing manual editing. Flow operates through configurable “Flow Sessions”, defined as time windows during which the app has access to the microphone; users can set session timeouts or disable automatic time limits. == Users and Adoption == Wispr initially targeted users such as venture capitalists, entrepreneurs and executives who process large volumes of text and often work in private or flexible environments. The user base later expanded via platforms such as Product Hunt to students, software developers, writers, lawyers and consultants. Flow has also been adopted by users with conditions such as ADHD, dyslexia, paralysis and carpal tunnel syndrome. About 40% of users are in the United States, 30% in Europe and the remaining 30% in other regions. More than 30% of users come from non-technical backgrounds. Flow supports 104 languages, with approximately 40% of dictations in English and 60% in other languages, including Spanish, French, German, Dutch, Hindi and Mandarin. Wispr has reported monthly user growth above 50%, a six-month active-user retention rate of about 80%, a payment rate around 19%, and revenue of approximately US$3.8 million between July 2024 and July 2025. == Development == Wispr has announced plans for an Android application and maintains waiting lists for Android, Linux and web versions of Flow. The company is developing shared-context features for teams so that the software can recognize common terminology within organizations and has stated that it aims to evolve Flow into a broader AI assistant for tasks such as messaging, note-taking and reminders. Wispr has also reported working with unnamed AI hardware partners on interaction layers for future devices. == Funding == In 2025 Wispr raised US$30 million in a Series A funding round led by Menlo Ventures, with participation from NEA, 8VC and several individual investors, including Evan Sharp and Henry Ward. Earlier investors include Neo, MVP Ventures and AIX Ventures. In November of that same year, the company raised a US$25 million Series A extension led by Notable Capital, with participation from Flight Fund, bringing its total funding to US$81 million. Wispr competes with other AI-based dictation and voice-input tools, including Aqua, Talktastic, Superwhisper and Betterdication.

    Read more →
  • Thirst trap

    Thirst trap

    A thirst trap is a type of social media post intended to entice viewers sexually. It refers to a viewer's "thirst", a colloquialism likening sexual frustration to dehydration, implying desperation, with the afflicted individual being described as "thirsty". The phrase entered into the lexicon in the late 1990s, but is most related to Internet slang that developed in the early 2010s. Its meaning has changed over time, previously referring to a graceless need for approval, affection or attention. == History == The term thirst trap originated within selfie culture, though its precise origins remain unclear. An early use of the phrase with reference to dehydration appears in the 1999 book Running for Dummies by Florence Griffith Joyner and John Hanc, where it referred to the deceptive sensation of thirst being quenched after initial fluid intake, advising continued hydration to avoid the so-called "thirst trap." The modern usage of thirst trap resurfaced around 2011 on platforms such as Twitter and Urban Dictionary, coinciding with the growing popularity of Snapchat, Instagram, and dating apps like Tinder and Grindr. In 2011, Urban Dictionary defined it as "any statement used to intentionally create attention or 'thirst'." By 2018, the term had entered mainstream discourse, appearing in outlets such as The New York Times and GQ without the need for explanation. == Usage of the term == Often, the term thirst trap describes an attractive picture of an individual that they post online. Thirst trap can also describe a digital heartthrob. For instance, former Canadian prime minister Justin Trudeau has been described as a political thirst trap. It has also been described as a modern form of "fishing for compliments". == Motivation == Thirst trapping may be driven by a variety of motives. Individuals often seek attention through "likes" and comments on social media, which can offer a temporary sense of validation and improved self-esteem. It can also serve as an outlet for expressing one's sexuality or enhancing a personal brand. In some cases, sharing such content may provide financial gain. Others might post thirst traps to cope with emotional distress, such as after breakup, or to spite a former lover. Sharing a thirst trap has also been used as a way to connect in times of social isolation (e.g. COVID-19 pandemic). From a physiological standpoint, endorphins and neurotransmitters like oxytocin and dopamine are released during sexual contact. It has been speculated outside of the academic setting that sharing and engaging with thirst traps may elicit similar pleasure responses. == Methodology == Methodologies have developed to take an optimal thirst trap photo. Reporting for Vice magazine, Graham Isador found several of his social network contacts spent a lot of time considering how to take the best photo and what text they should use. They considered angles and lighting. Sometimes they made use of the self-timer feature available on some cameras. Often, body parts are put on display without being too explicit (e.g. bulges of male genitalia, breast cleavage, abdominal muscles, pectoral muscles, backs, buttocks). Often, the thirst trap is accompanied by a caption. For instance, in October 2019, actress Tracee Ellis Ross posted bikini pictures on Instagram with a caption that included the message: "I've worked so hard to feel good in my skin and to build a life that truly matches me and I'm in it and it feels good. ... No filter, no retouch 47 year old thirst trap! Boom!" On Instagram, #ThirstTrapThursdays is a popular tag. Followers reply in turn after a posting. == Variations == "Gatsbying" is a variation of the thirst trap, where one puts posts on social media to attract the attention of a particular individual. The term alludes to the novel The Great Gatsby where the character Jay Gatsby would throw extravagant parties to attract the attention of his love interest, Daisy. "Instagrandstanding" is an alternative name for this. "Wholesome trapping" has developed, where one posts pictures of more meaningful aspects of life, such as spending time with friends or doing outdoor activities. == Criticism == Psychotherapist Lisa Brateman has criticized thirst traps as an unhealthy method of receiving external validation. This desire for external validation can be addictive. Thirst traps can cause pressure to maintain a good physical appearance, and therefore cause self-esteem issues. Additionally, thirst traps are often highly choreographed and thus present a distorted perception of reality. The manufacturing of thirst traps can be limited when one enters a relationship or with time as the body ages. In some cases, thirst traps can lead to harassment and online bullying. In April 2020, model Chrissy Teigen posted a video of herself wearing a black one-piece swimsuit, and she received a multitude of negative comments that constituted bullying and body shaming.

    Read more →
  • Sentiment analysis

    Sentiment analysis

    Sentiment analysis (also known as opinion mining) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly. == Types == A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. Precursors to sentimental analysis include the General Inquirer, which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's psychological state based on analysis of their verbal behavior. Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale. Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, and Pang who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang and Snyder among others: Pang and Lee expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale). First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 AAAI Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text. Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy and SVMs can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step. This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles. A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text. There are various other types of sentiment analysis, such as aspect-based sentiment analysis, grading sentiment analysis (positive, negative, neutral), multilingual sentiment analysis and detection of emotions. === Subjectivity/objectivity identification === This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification. The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su, results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang showed that removing objective sentences from a document before classifying its polarity helped improve performance. Subjective and objective identification, emerging subtasks of sentiment analysis to use syntactic, semantic features, and machine learning knowledge to identify if a sentence or document contains facts or opinions. Awareness of recognizing factual and opinions is not recent, having possibly first presented by Carbonell at Yale University in 1979. The term objective refers to the incident carrying factual information. Example of an objective sentence: 'To be elected president of the United States, a candidate must be at least thirty-five years of age.' The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions, also known as 'private states'. In the example down below, it reflects a private states 'We Americans'. Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu (2010). Furthermore, three types of attitudes were observed by Liu (2010), 1) positive opinions, 2) neutral opinions, and 3) negative opinions. Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.' This analysis is a classification problem. Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al. (2003). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hand to automated features learning. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers. However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume. Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contribu

    Read more →