AI Art That Looks Real

AI Art That Looks Real — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • IT baseline protection

    IT baseline protection

    The IT baseline protection (German: IT-Grundschutz) approach from the German Federal Office for Information Security (BSI) is a methodology to identify and implement computer security measures in an organization. The aim is the achievement of an adequate and appropriate level of security for IT systems. To reach this goal the BSI recommends "well-proven technical, organizational, personnel, and infrastructural safeguards". Organizations and federal agencies show their systematic approach to secure their IT systems (e.g. Information Security Management System) by obtaining an ISO/IEC 27001 Certificate on the basis of IT-Grundschutz. == Overview baseline security == The term baseline security signifies standard security measures for typical IT systems. It is used in various contexts with somewhat different meanings. For example: Microsoft Baseline Security Analyzer: Software tool focused on Microsoft operating system and services security Cisco security baseline: Vendor recommendation focused on network and network device security controls Nortel baseline security: Set of requirements and best practices with a focus on network operators ISO/IEC 13335-3 defines a baseline approach to risk management. This standard has been replaced by ISO/IEC 27005, but the baseline approach was not taken over yet into the 2700x series. There are numerous internal baseline security policies for organizations, The German BSI has a comprehensive baseline security standard, that is compliant with the ISO/IEC 27000-series == BSI IT baseline protection == The foundation of an IT baseline protection concept is initially not a detailed risk analysis. It proceeds from overall hazards. Consequently, sophisticated classification according to damage extent and probability of occurrence is ignored. Three protection needs categories are established. With their help, the protection needs of the object under investigation can be determined. Based on these, appropriate personnel, technical, organizational and infrastructural security measures are selected from the IT Baseline Protection Catalogs. The Federal Office for Security in Information Technology's IT Baseline Protection Catalogs offer a "cookbook recipe" for a normal level of protection. Besides probability of occurrence and potential damage extents, implementation costs are also considered. By using the Baseline Protection Catalogs, costly security analyses requiring expert knowledge are dispensed with, since overall hazards are worked with in the beginning. It is possible for the relative layman to identify measures to be taken and to implement them in cooperation with professionals. The BSI grants a baseline protection certificate as confirmation for the successful implementation of baseline protection. In stages 1 and 2, this is based on self declaration. In stage 3, an independent, BSI-licensed auditor completes an audit. Certification process internationalization has been possible since 2006. ISO/IEC 27001 certification can occur simultaneously with IT baseline protection certification. (The ISO/IEC 27001 standard is the successor of BS 7799-2). This process is based on the new BSI security standards. This process carries a development price which has prevailed for some time. Corporations having themselves certified under the BS 7799-2 standard are obliged to carry out a risk assessment. To make it more comfortable, most deviate from the protection needs analysis pursuant to the IT Baseline Protection Catalogs. The advantage is not only conformity with the strict BSI, but also attainment of BS 7799-2 certification. Beyond this, the BSI offers a few help aids like the policy template and the GSTOOL. One data protection component is available, which was produced in cooperation with the German Federal Commissioner for Data Protection and Freedom of Information and the state data protection authorities and integrated into the IT Baseline Protection Catalog. This component is not considered, however, in the certification process. == Baseline protection process == The following steps are taken pursuant to the baseline protection process during structure analysis and protection needs analysis: The IT network is defined. IT structure analysis is carried out. Protection needs determination is carried out. A baseline security check is carried out. IT baseline protection measures are implemented. Creation occurs in the following steps: IT structure analysis (survey) Assessment of protection needs Selection of actions Running comparison of nominal and actual. === IT structure analysis === An IT network includes the totality of infrastructural, organizational, personnel, and technical components serving the fulfillment of a task in a particular information processing application area. An IT network can thereby encompass the entire IT character of an institution or individual division, which is partitioned by organizational structures as, for example, a departmental network, or as shared IT applications, for example, a personnel information system. It is necessary to analyze and document the information technological structure in question to generate an IT security concept and especially to apply the IT Baseline Protection Catalogs. Due to today's usually heavily networked IT systems, a network topology plan offers a starting point for the analysis. The following aspects must be taken into consideration: The available infrastructure, The organizational and personnel framework for the IT network, Networked and non-networked IT systems employed in the IT network. The communications connections between IT systems and externally, IT applications run within the IT network. === Protection needs determination === The purpose of the protection needs determination is to investigate what protection is sufficient and appropriate for the information and information technology in use. In this connection, the damage to each application and the processed information, which could result from a breach of confidentiality, integrity or availability, is considered. Important in this context is a realistic assessment of the possible follow-on damages. A division into the three protection needs categories "low to medium", "high" and "very high" has proved itself of value. "Public", "internal" and "secret" are often used for confidentiality. === Modelling === Heavily networked IT systems typically characterize information technology in government and business these days. As a rule, therefore, it is advantageous to consider the entire IT system and not just individual systems within the scope of an IT security analysis and concept. To be able to manage this task, it makes sense to logically partition the entire IT system into parts and to separately consider each part or even an IT network. Detailed documentation about its structure is prerequisite for the use of the IT Baseline Protection Catalogs on an IT network. This can be achieved, for example, via the IT structure analysis described above. The IT Baseline Protection Catalog’s' components must ultimately be mapped onto the components of the IT network in question in a modelling step. === Baseline security check === The baseline security check is an organisational instrument offering a quick overview of the prevailing IT security level. With the help of interviews, the status quo of an existing IT network (as modelled by IT baseline protection) relative to the number of security measures implemented from the IT Baseline Protection Catalogs are investigated. The result is a catalog in which the implementation status "dispensable", "yes", "partly", or "no" is entered for each relevant measure. By identifying not yet, or only partially, implemented measures, improvement options for the security of the information technology in question are highlighted. The baseline security check gives information about measures, which are still missing (nominal vs. actual comparison). From this follows what remains to be done to achieve baseline protection through security. Not all measures suggested by this baseline check need to be implemented. Peculiarities are to be taken into account! It could be that several more or less unimportant applications are running on a server, which have lesser protection needs. In their totality, however, these applications are to be provided with a higher level of protection. This is called the (cumulation effect). The applications running on a server determine its need for protection. Several IT applications can run on an IT system. When this occurs, the application with the greatest need for protection determines the IT system’s protection category. Conversely, it is conceivable that an IT application with great protection needs does not automatically transfer this to the IT system. This may happen because the IT system is configured redundantly, or because only an inconsequential part is running on it. This is called the (distribution effect). This is the case, fo

    Read more →
  • Protecting Kids From Social Media Act

    Protecting Kids From Social Media Act

    Protecting Kids on Social Media Act or HB 1891 is an American law that was introduced by William Lamberth of Sumner County, Tennessee and was signed into law by Tennessee's governor on May 2, 2024. The bill requires social media websites such as X, YouTube, TikTok, Facebook and others to verify the age of users and if those users are under 18, they must have parental consent. == Progress == The law passed the Tennessee State Legislature with little opposition: the bill had only two no votes in the House from Aftyn Behn and Vincent B. Dixie, and it had zero no votes in the Senate. == Bill summary == Every social media company must verify the age of new users after the law takes effect, and if the user had created an account before the law took effect, they must verify the age of the person attempting to access the account within 14 days. If the new user or the user who originally owned an account is under 18 years of age, they must get parental consent and the third party or social media company must not retain the data from the age verification process or obtaining parental consent. Parents who are account holders of those under 18 can view the privacy settings, set daily time restrictions, and implement breaks during which the minor cannot access the account. The law is enforced by the Attorney General of Tennessee and went into effect on January 1, 2025. == Lawsuit == On October 3, 2024, the trade association NetChoice filed a lawsuit against Tennessee Attorney General Jonathan Skrmetti in the Middle District Court of Tennessee, claiming that the law violates the First Amendment. The Judge for the case is William L. Campbell Jr. An initial case management conference was originally scheduled for December 4, 2024, however it was delayed because of the Supreme Court case United States v. Skrmetti, recommending that the conference be delayed after January 20, 2025. On February 14, 2025, Judge Eli Richardson denied NetChoice's motion for a temporary restraining order because it would disrupt the status quo of the case.

    Read more →
  • Ultra (cryptography)

    Ultra (cryptography)

    Ultra was the designation adopted by British military intelligence in June 1941 for wartime signals intelligence obtained by breaking high-level encrypted enemy radio and teleprinter communications at the Government Code and Cypher School (GC&CS) at Bletchley Park. Ultra eventually became the standard designation among the western Allies for all such intelligence. The name arose because the intelligence obtained was considered more important than that designated by the highest British security classification then used (Most Secret) and so was regarded as being Ultra Secret. Several other cryptonyms had been used for such intelligence. The code name "Boniface" was used as a cover name for Ultra. In order to ensure that the successful code-breaking did not become apparent to the Germans, British intelligence created a fictional MI6 master spy, Boniface, who controlled a fictional series of agents throughout Germany. Information obtained through code-breaking was often attributed to the human intelligence from the Boniface network. The U.S. used the codename Magic for its decrypts from Japanese sources, including the "Purple" cipher. Much of the German cipher traffic was encrypted on the Enigma machine. Used properly, the German military Enigma would have been virtually unbreakable; in practice, shortcomings in operation allowed it to be broken. The term "Ultra" has often been used almost synonymously with "Enigma decrypts". However, Ultra also encompassed decrypts of the German Lorenz SZ 40/42 machines that were used by the German High Command, and the Hagelin machine. Many observers, at the time and later, regarded Ultra as immensely valuable to the Allies. Winston Churchill was reported to have told King George VI, when presenting to him Stewart Menzies (head of the Secret Intelligence Service and the person who controlled distribution of Ultra decrypts to the government): "It is thanks to the secret weapon of General Menzies, put into use on all the fronts, that we won the war!" F. W. Winterbotham quoted the western Supreme Allied Commander, Dwight D. Eisenhower, at war's end describing Ultra as having been "decisive" to Allied victory. Sir Harry Hinsley, Bletchley Park veteran and official historian of British Intelligence in World War II, made a similar assessment of Ultra, saying that while the Allies would have won the war without it, "the war would have been something like two years longer, perhaps three years longer, possibly four years longer than it was." However, Hinsley and others have emphasized the difficulties of counterfactual history in attempting such conclusions, and some historians, such as John Keegan, have said the shortening might have been as little as the three months it took the United States to deploy the atomic bomb. == Sources of intelligence == Most Ultra intelligence was derived from reading radio messages that had been encrypted with cipher machines, complemented by material from radio communications using traffic analysis and direction finding. In the early phases of the war, particularly during the eight-month Phoney War, the Germans could transmit most of their messages using land lines and so had no need to use radio. This meant that those at Bletchley Park had some time to build up experience of collecting and starting to decrypt messages on the various radio networks. German Enigma messages were the main source, with those of the German air force (the Luftwaffe) predominating, as they used radio more and their operators were particularly ill-disciplined. === German === ==== Enigma ==== "Enigma" refers to a family of electro-mechanical rotor cipher machines. These produced a polyalphabetic substitution cipher and were widely thought to be unbreakable in the 1920s, when a variant of the commercial Model D was first used by the Reichswehr. The German Army (Heer), Navy, Air Force, Nazi party, Gestapo and German diplomats used Enigma machines in several variants. Abwehr (German military intelligence) used a four-rotor machine without a plugboard and Naval Enigma used different key management from that of the army or air force, making its traffic far more difficult to cryptanalyse; each variant required different cryptanalytic treatment. The commercial versions were not as secure and Dilly Knox of GC&CS is said to have broken one before the war. German military Enigma was first broken in December 1932 by Marian Rejewski and the Polish Cipher Bureau, using a combination of brilliant mathematics, the services of a spy in the German office responsible for administering encrypted communications, and good luck. The Poles read Enigma to the outbreak of World War II and beyond, in France. At the turn of 1939, the Germans made the systems ten times more complex, which required a tenfold increase in Polish decryption equipment, which they could not meet. On 25 July 1939, the Polish Cipher Bureau handed reconstructed Enigma machines and their techniques for decrypting ciphers to the French and British. Gordon Welchman wrote, Ultra would never have got off the ground if we had not learned from the Poles, in the nick of time, the details both of the German military Enigma machine, and of the operating procedures that were in use. At Bletchley Park, some of the key people responsible for success against Enigma included mathematicians Alan Turing and Hugh Alexander and, at the British Tabulating Machine Company, chief engineer Harold Keen. After the war, interrogation of German cryptographic personnel led to the conclusion that German cryptanalysts understood that cryptanalytic attacks against Enigma were possible but were thought to require impracticable amounts of effort and investment. The Poles' early start at breaking Enigma and the continuity of their success gave the Allies an advantage when World War II began. ==== Lorenz cipher ==== In June 1941, the Germans started to introduce on-line stream cipher teleprinter systems for strategic point-to-point radio links, to which the British gave the code-name Fish. Several systems were used, principally the Lorenz SZ 40/42 (codenamed "Tunny" by the British) and Geheimfernschreiber ("Sturgeon"). These cipher systems were cryptanalysed, particularly Tunny, which the British thoroughly penetrated. It was eventually attacked using Colossus machines, which were the first digital programme-controlled electronic computers. In many respects the Tunny work was more difficult than for the Enigma, since the British codebreakers had no knowledge of the machine producing it and no head-start such as that the Poles had given them against Enigma. Although the volume of intelligence derived from this system was much smaller than that from Enigma, its importance was often far higher because it produced primarily high-level, strategic intelligence that was sent between Wehrmacht high command (Oberkommando der Wehrmacht, OKW). The eventual bulk decryption of Lorenz-enciphered messages contributed significantly, and perhaps decisively, to the defeat of Nazi Germany. Nevertheless, the Tunny story has become much less well known among the public than the Enigma one. At Bletchley Park, some of the key people responsible for success in the Tunny effort included mathematicians W. T. "Bill" Tutte and Max Newman and electrical engineer Tommy Flowers. === Italian === In June 1940, the Italians were using book codes for most of their military messages, except for the Italian Navy, which in early 1941 had started using a version of the Hagelin rotor-based cipher machine C-38. This was broken from June 1941 onwards by the Italian subsection of GC&CS at Bletchley Park. === Japanese === In the Pacific theatre, a Japanese cipher machine, called "Purple" by the Americans, was used for highest-level Japanese diplomatic traffic. It produced a polyalphabetic substitution cipher, but unlike Enigma, was not a rotor machine, being built around electrical stepping switches. It was broken by the US Army Signal Intelligence Service and disseminated as Magic. Detailed reports by the Japanese ambassador to Germany were encrypted on the Purple machine. His reports included reviews of German assessments of the military situation, reviews of strategy and intentions, reports on direct inspections by the ambassador (in one case, of Normandy beach defences), and reports of long interviews with Hitler. The Japanese are said to have obtained an Enigma machine in 1937, although it is debated whether they were given it by the Germans or bought a commercial version, which, apart from the plugboard and internal wiring, was the German Heer/Luftwaffe machine. Having developed a similar machine, the Japanese did not use the Enigma machine for their most secret communications. The chief fleet communications code system used by the Imperial Japanese Navy was called JN-25 by the Americans, and by early 1942 the US Navy had made considerable progress in decrypting Japanese naval messages. The US Army also made progress on the

    Read more →
  • SFINKS

    SFINKS

    Sfinks (Polish for "Sphynx") was also the initial name of the Janusz A. Zajdel Award In cryptography, SFINKS is a stream cypher algorithm developed by An Braeken, Joseph Lano, Nele Mentens, Bart Preneel, and Ingrid Verbauwhede. It includes a message authentication code. It has been submitted to the eSTREAM Project of the eCRYPT network. In 2005, Nicolas T. Courtois noted that, while the cipher is elegant and secure against some simple algebraic attacks, it is vulnerable to more elaborate known attacks.

    Read more →
  • Productivity software

    Productivity software

    Productivity software (also called personal productivity software or office productivity software) is application software used for producing information (such as documents, presentations, worksheets, databases, charts, graphs, digital paintings, electronic music and digital video). Its names arose from it increasing productivity, especially of individual office workers, from typists to knowledge workers, although its scope is now wider than that. Office suites, which brought word processing, spreadsheet, and relational database programs to the desktop in the 1980s, are the core example of productivity software. They revolutionized the office with the magnitude of the productivity increase they brought as compared with the pre-1980s office environments of typewriters, paper filing, and handwritten lists and ledgers. In the United States, as of 2015, some 78% of "middle-skill" occupations (those that call for more than a high school diploma but less than a bachelor's degree) required the use of productivity software. == Details == Productivity software traditionally runs directly on a computer. For example, Plus/4 model of computer contains in ROM for applications of productivity software. Productivity software is one of the reasons people use personal computers. == Office suite == An office suite is a bundle of productivity software (a software suite) intended to be used by office workers. The components are generally distributed together, have a consistent user interface and usually can interact with each other, sometimes in ways that the operating system would not normally allow. The earliest office suite for personal computers was MicroPro International's StarBurst in the early 1980s, comprising the WordStar word processor, the CalcStar spreadsheet and the DataStar database software. Other suites arose in the 1980s, and Microsoft Office came to dominate the market in the 1990s, a position it retains as of 2024. During the 1990s, office suite products gained popularity by offering bundles of applications that, when bought as part of a suite, effectively discounted the individual applications, with four or five applications being bundled for the price of two applications bought separately. When faced with such potential savings, customers could be "tempted by the suite, rather than the value of a particular product", and by 1994 more than 60 percent of the sales of Microsoft Word and around 70 percent of the sales of Microsoft Excel were as part of sales of Microsoft Office. Such considerations had an impact on vendors of individual applications, often smaller companies, raising concerns that office suites were "stifling innovation", and even established vendors such as Borland and WordPerfect were having to adapt to the suite phenomenon, Borland ultimately deciding to sell its Quattro Pro spreadsheet to WordPerfect as the latter sought to assemble its own suite product. The dominant suite vendors, Microsoft and Lotus, downplayed competition and innovation concerns, claiming that users were still able to exercise choice and that "user-driven development" was guiding the evolution of office suites. Another view was that component-based software would eventually emerge, focusing development on more specialised components used by productivity software, empowering "a plethora of third-party developers", and that a "mix and match" approach of such components would adapt to the user's way of working. === Office suite components === The base components of office suites are: Word processor Spreadsheet Presentation program Other components include: Database software Graphics suite (raster graphics editor, vector graphics editor, image viewer) Desktop publishing software Formula editor Diagramming software Email client Communication software Personal information manager Notetaking Groupware Project management software Table (information) Web log analysis software

    Read more →
  • Commit (data management)

    Commit (data management)

    In computer science and data management, a commit is a behavior that marks the end of a transaction and provides Atomicity, Consistency, Isolation, and Durability (ACID) in transactions. The submission records are stored in the submission log for recovery and consistency in case of failure. In terms of transactions, the opposite of committing is giving up tentative changes to the transaction, which is rolled back. Due to the rise of distributed computing and the need to ensure data consistency across multiple systems, commit protocols have been evolving since their emergence in the 1970s. The main developments include the Two-Phase Commit (2PC) first proposed by Jim Gray, which is the fundamental core of distributed transaction management. Subsequently, the Three-phase Commit (3PC), Hypothesis Commit (PC), Hypothesis Abort (PA), and Optimistic Commit protocols gradually emerged, solving the problems of blocking and fault recovery. Today, new fields such as e-commerce payment and blockchain technology are emerging, and submission protocols play a significant role in various business areas. By effectively handling transactions, resolving faults and recovering problems, the commit protocol becomes crucial in ensuring the reliability and consistency of data management. == History == The concept of Commit originated in the late 1960s and early 1970s, when computer technology was rapidly advancing and data management was becoming an important requirement in business and finance. Enterprises have gradually replaced the traditional paper records with computers, which has fully improved the work efficiency. The reliability and consistency of data have become a necessary requirement. Transaction management at this stage is relatively simple, limited to using a single computer for processing. It merely effectively records the changes in data to ensure that the data remains stable after the transaction is completed or terminated. In the late 1970s, as database systems moved from a single calculator operation to multiple distributed collaborations, ensuring data consistency and reliability became a new challenge. In 1978, computer scientist Jim Gray proposed the famous two-phase Commit Protocol (2PC), which became an effective solution for distributed transaction management, successfully managing data synchronization problems between multiple nodes. However, this commit protocol has some potential transaction blocking problems when nodes fail. In the early 1980s, researchers discovered that although the two-step commit protocol was effective at synchronizing data, there could be long waits and even system crashes, with limitations. To improve this problem, people have begun to explore new and effective methods, including enhancing efficiency by reducing message communication during the protocol process. IBM's R database introduced the Assumed Commit and Assumed abort protocols, which contributed significantly to transaction management efficiency. These two protocols have greatly improved the processing efficiency of distributed transactions by reducing communication overhead and have become an important breakthrough in the technology of transaction commit protocols. By the early 1990s, with the increase in business demands and the complexity of transactions, enterprises required higher efficiency in distributed transaction processing. In order to adapt to the needs of different environments, the scientific community has gradually developed various variants of commit protocols to provide more flexible transaction management options for different needs. For example, the three-phase commit protocol promotes the commit of transactions more effectively and reduces the occurrence of blocking problems by adding a pre-commit protocol and a timeout mechanism. In the 21st century, with the popularization of mobile Internet and wireless technology, the commit protocol has been further developed, and researchers have begun to pay attention to how to reduce the blocking in the transaction process to solve the problem of broadband limitation, battery life and network instability in the mobile environment. The proposal of optimistic commit protocol marks the extension of commit technology from traditional database to the emerging mobile data field. This protocol allows transactions to temporarily use unconfirmed data, improving the user experience in cases of poor network conditions. In recent years, with the rise of blockchain and decentralized technologies, submission protocols and consensus mechanisms have gradually merged. These consensus algorithms play a role in tamper-proofing and preventing malicious attacks on node pairs in a decentralized environment. This enables commit to no longer be confined to the scope of traditional database management, but to become the core technology of trust computing and distributed ledgers, further expanding the application field of commit in the digital age. This integration has brought about extensive application impacts. Each transaction can achieve the effect of tracking global submissions through the verification of the consensus mechanism, becoming an important technical foundation for promoting the circulation of digital assets, the operation of cryptocurrencies and decentralized applications. == Commit Protocol Types == In the world of data management, a transaction is a series of database operations, such as bank transfers and order submission. In order to ensure the accuracy, consistency, and security of the data, transactions are usually completed completely, or cancelled completely, leaving no partially completed results. Commit protocol is the method used to coordinate this process. Different protocols are applicable to different submission scenarios and have their own advantages and disadvantages. There are four major commit protocols. === Two-Phase Commit (2PC) === The two-phase commit protocol is the most classic and broadest approach to distributed transactions, which includes both a preparation phase and a commit phase. This commit protocol is designed to allow the database coordinator to determine if all participating nodes agree. The preparation phase is the phase in which the coordination node sends a ready to commit request to all nodes participating in the transaction. The commit phase is a global commit after all participating nodes are ready, and if no agreement is reached, all nodes roll back the transaction and undo all previous operations. Although the two-phase commit protocol is the easiest to operate and widely used, its obvious drawback is that it can cause transactions to be blocked for a long time when nodes fail, resulting in a decline in system performance and making it difficult to terminate or continue immediately. === Three-Phase Commit (3PC) === The three-phase commit protocol is an improved non-blocking protocol based on 2PC, which is divided into three stages: preparation, pre-commit and commit. Firstly, each node sends a "preparation" request. After confirmation, a "pre-submission" stage is added. At this point, each node has completed most of the preparatory work and is waiting for the final confirmation. Finally, in the formal commit stage, after all nodes send the "commit" request, the transaction is completed and committed. Compared with 2PC, it increases the timeout mechanism, avoids the blocking problem caused by single point of failure, and improves the reliability of the system. The three-phase commit protocol significantly optimizes transaction reliability, but adds additional overhead for message transmission and state maintenance. It is more suitable for distributed application scenarios with high transaction sensitivity and no acceptance of long waiting times. === Presumed Commit (PC) and Presumed Abort (PA) === Presumed Commit (PC) is the default that the transaction will be committed successfully and rollback will be notified unless an anomaly is encountered. This commit reduces the message overhead and logging costs of a normal commits. Presumed Abort (PA) is assumed that the default state of the transaction is a rollback and will only be committed when all nodes have explicitly agreed. This commit is applicable to transactions that are not updated frequently or have a low probability of successful commit. The IBM R Distributed Database management System was the first to propose and practice the PC and PA protocols, handling distributed transaction management very efficiently and becoming a classic case in the field of database transaction management. === Optimistic Commit Protocol === With the rise of the Internet, the previous commit protocols are facing new challenges, especially in mobile scenarios with unstable networks. Excessively long transaction waiting times can affect the user experience. The Optimistic Commit Protocol allows a transaction to temporarily access uncommitted data before committing to avoid wait times. This type of commit is suitable f

    Read more →
  • Sentiment analysis

    Sentiment analysis

    Sentiment analysis (also known as opinion mining) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. With the rise of deep language models, such as RoBERTa, more difficult data domains can be analyzed, e.g., news texts where authors typically express their opinion/sentiment less explicitly. == Types == A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. Precursors to sentimental analysis include the General Inquirer, which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's psychological state based on analysis of their verbal behavior. Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale. Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, and Pang who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang and Snyder among others: Pang and Lee expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale). First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 AAAI Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text. Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy and SVMs can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step. This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles. A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text. There are various other types of sentiment analysis, such as aspect-based sentiment analysis, grading sentiment analysis (positive, negative, neutral), multilingual sentiment analysis and detection of emotions. === Subjectivity/objectivity identification === This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification. The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su, results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang showed that removing objective sentences from a document before classifying its polarity helped improve performance. Subjective and objective identification, emerging subtasks of sentiment analysis to use syntactic, semantic features, and machine learning knowledge to identify if a sentence or document contains facts or opinions. Awareness of recognizing factual and opinions is not recent, having possibly first presented by Carbonell at Yale University in 1979. The term objective refers to the incident carrying factual information. Example of an objective sentence: 'To be elected president of the United States, a candidate must be at least thirty-five years of age.' The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions, also known as 'private states'. In the example down below, it reflects a private states 'We Americans'. Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu (2010). Furthermore, three types of attitudes were observed by Liu (2010), 1) positive opinions, 2) neutral opinions, and 3) negative opinions. Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.' This analysis is a classification problem. Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al. (2003). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hand to automated features learning. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers. However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume. Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contribu

    Read more →
  • Data hub

    Data hub

    A data hub is a center of data exchange that is supported by data science, data engineering, and data warehouse technologies to interact with endpoints such as applications and algorithms. == Features == A data hub differs from a data warehouse in that it is generally unintegrated and often at different grains. It differs from an operational data store because a data hub does not need to be limited to operational data. A data hub differs from a data lake by homogenizing data and possibly serving data in multiple desired formats, rather than simply storing it in one place, and by adding other value to the data such as de-duplication, quality, security, and a standardized set of query services. A data lake tends to store data in one place for availability, and allow/require the consumer to process or add value to the data. Data hubs are ideally the "go-to" place for data within an enterprise, so that many point-to-point connections between callers and data suppliers do not need to be made, and so that the data hub organization can negotiate deliverables and schedules with various data enclave teams, rather than being an organizational free-for-all as different teams try to get new services and features from many other teams.

    Read more →
  • Attensity

    Attensity

    Attensity was an American company that provided social analytics and engagement applications for social customer relationship management (social CRM). Attensity's text analytics software applications extracted facts, relationships and sentiment from unstructured data. == History == Attensity was founded in 2000. An early investor in Attensity was In-Q-Tel, which funds technology to support the missions of the US Government and the broader DOD. InTTENSITY, an independent company that has combined Inxight with Attensity Software (the only joint development project that combines two InQTel funded software packages), was the exclusive distributor and outlet for Attensity in the Federal Market. In 2009, Attensity Corp., then based in Palo Alto, merged with Germany's Empolis and Living-e AG to form Attensity Group. In 2010, Attensity Group acquired Biz360, a provider of social media monitoring and market intelligence solutions. In early 2012, Attensity Group divested itself of the Empolis business unit via a management buyout; that unit currently conducts business under its pre-merger name. Attensity Group was a closely held private company. Its majority shareholder was Aeris Capital, a private Swiss investment office advising a high-net-worth individual and his charitable foundation. Foundation Capital, Granite Ventures, and Scale Venture Partners were among Biz360's investors and thus became shareholders in Attensity Group. In February 2016, Attensity's IP assets were acquired by InContact, and Attensity closed.

    Read more →
  • Data integration

    Data integration

    Data integration is the process of combining, sharing, or synchronizing data from multiple sources to provide users with a unified view. There are a wide range of possible applications for data integration, from commercial (such as when a business merges multiple databases) to scientific (combining research data from different bioinformatics repositories). The decision to integrate data tends to arise when the volume, complexity (that is, big data) and need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information. == History == Issues with combining heterogeneous data sources, often referred to as information silos, under a single query interface have existed for some time. In the early 1980s, computer scientists began designing systems for interoperability of heterogeneous databases. The first data integration system driven by structured metadata was designed in 1991 at the University of Minnesota for the Integrated Public Use Microdata Series (IPUMS). IPUMS used a data warehousing approach, which extracts, transforms, and loads data from heterogeneous sources into a unique view schema so data from different sources become compatible. By making thousands of population databases interoperable, IPUMS demonstrated the feasibility of large-scale data integration. The data warehouse approach offers a tightly coupled architecture because the data are already physically reconciled in a single queryable repository, so it usually takes little time to resolve queries. The data warehouse approach is less feasible for data sets that are frequently updated, requiring the extract, transform, load (ETL) process to be continuously re-executed for synchronization. Difficulties also arise in constructing data warehouses when one has only a query interface to summary data sources and no access to the full data. This problem frequently emerges when integrating several commercial query services like travel or classified advertisement web applications. A trend began in 2009 favoring the loose coupling of data and providing a unified query-interface to access real time data over a mediated schema (see Figure 2), which allows information to be retrieved directly from original databases. This is consistent with the SOA approach popular in that era. This approach relies on mappings between the mediated schema and the schema of original sources, and translating a query into decomposed queries to match the schema of the original databases. Such mappings can be specified in two ways: as a mapping from entities in the mediated schema to entities in the original sources (the "Global-as-View" (GAV) approach), or as a mapping from entities in the original sources to the mediated schema (the "Local-as-View" (LAV) approach). The latter approach requires more sophisticated inferences to resolve a query on the mediated schema, but makes it easier to add new data sources to a (stable) mediated schema. As of 2010, some of the work in data integration research concerns the semantic integration problem. This problem addresses not the structuring of the architecture of the integration, but how to resolve semantic conflicts between heterogeneous data sources. For example, if two companies merge their databases, certain concepts and definitions in their respective schemas like "earnings" inevitably have different meanings. In one database it may mean profits in dollars (a floating-point number), while in the other it might represent the number of sales (an integer). A common strategy for the resolution of such problems involves the use of ontologies which explicitly define schema terms and thus help to resolve semantic conflicts. This approach represents ontology-based data integration. On the other hand, the problem of combining research results from different bioinformatics repositories requires bench-marking of the similarities, computed from different data sources, on a single criterion such as positive predictive value. This enables the data sources to be directly comparable and can be integrated even when the natures of experiments are distinct. As of 2011, it was determined that current data modeling methods were imparting data isolation into every data architecture in the form of islands of disparate data and information silos. This data isolation is an unintended artifact of the data modeling methodology that results in the development of disparate data models. Disparate data models, when instantiated as databases, form disparate databases. Enhanced data model methodologies have been developed to eliminate the data isolation artifact and to promote the development of integrated data models. One enhanced data modeling method recasts data models by augmenting them with structural metadata in the form of standardized data entities. As a result of recasting multiple data models, the set of recast data models will now share one or more commonality relationships that relate the structural metadata now common to these data models. Commonality relationships are a peer-to-peer type of entity relationships that relate the standardized data entities of multiple data models. Multiple data models that contain the same standard data entity may participate in the same commonality relationship. When integrated data models are instantiated as databases and are properly populated from a common set of master data, then these databases are integrated. Since 2011, data hub approaches have been of greater interest than fully structured (typically relational) Enterprise Data Warehouses. Since 2013, data lake approaches have risen to the level of Data Hubs. (See all three search terms popularity on Google Trends.) These approaches combine unstructured or varied data into one location, but do not necessarily require an (often complex) master relational schema to structure and define all data in the Hub. In recent times, as the number of applications being used have increased many fold and application to application integration have become critical and this has given rise to [Unified APIs] that help application developers integrate their apps with other apps and more recently with [MCP - Model Context Protocol] taking it a step further for AI Agents. Data integration plays a big role in business regarding data collection used for studying the market. Converting the raw data retrieved from consumers into coherent data is something businesses try to do when considering what steps they should take next. Organizations are more frequently using data mining for collecting information and patterns from their databases, and this process helps them develop new business strategies to increase business performance and perform economic analyses more efficiently. Compiling the large amount of data they collect to be stored in their system is a form of data integration adapted for Business intelligence to improve their chances of success. == Example == Consider a web application where a user can query a variety of information about cities (such as crime statistics, weather, hotels, demographics, etc.). Traditionally, the information must be stored in a single database with a single schema. But any single enterprise would find information of this breadth somewhat difficult and expensive to collect. Even if the resources exist to gather the data, it would likely duplicate data in existing crime databases, weather websites, and census data. A data-integration solution may address this problem by considering these external resources as materialized views over a virtual mediated schema, resulting in "virtual data integration". This means application-developers construct a virtual schema—the mediated schema—to best model the kinds of answers their users want. Next, they design "wrappers" or adapters for each data source, such as the crime database and weather website. These adapters simply transform the local query results (those returned by the respective websites or databases) into an easily processed form for the data integration solution (see figure 2). When an application-user queries the mediated schema, the data-integration solution transforms this query into appropriate queries over the respective data sources. Finally, the virtual database combines the results of these queries into the answer to the user's query. This solution offers the convenience of adding new sources by simply constructing an adapter or an application software blade for them. It contrasts with ETL systems or with a si

    Read more →
  • Social media newsroom

    Social media newsroom

    A social media newsroom is a company resource, set up to increase the functionality and usability of the traditional online newsroom. Social media newsrooms (SMNs) are intended to encourage dialogue and information sharing. Unlike online newsrooms, content is accessible to more than just journalists, but to all those with whom the company engages such as bloggers, their prospects, customers, business partners and investors. It gives these stakeholders access to news, public relations announcements, images, audio, video and other multimedia files. In addition to posting press releases and corporate news, companies can integrate other social content from sites such as YouTube, Flickr and Slideshow as well as streams from corporate Twitter accounts. Traditional tools for journalists such as corporate fast facts, leadership information, a multimedia library, financial information, awards and other recent media coverage are also included in an SMN. Examples of companies effectively using social media newsrooms include Opel Group, Pressat, First Direct, MyNewsdesk, Scania and Newport Beach.

    Read more →
  • Media engagement framework

    Media engagement framework

    The media engagement framework is a planning framework used by marketing professionals to understand the behavior of social media marketing-based audiences. The construct was introduced in the book, ROI of Social Media. Powell’s background in marketing ROI and Groves' experience and understanding of the applications of social media in business led to a collaboration. Dimos joined as a brand strategist for Litmus Group, a global management consulting firm. The media engagement framework consists of the definitions of personas (Individuals, Consumers and Influencers), referenced by the competitive set or constraint that applies to that persona and the measurement framework that might be applied to those personas. It is referenced at the center of the marketing process diagram, surrounded by the marketing functions of strategy, tactics, metrics and ROI. The marketing process diagram describes how the media engagement framework can apply to any strategic marketing activity but was developed to establish a completely integrated framework describing how both traditional and social media marketing activities can be planned, executed, measured and improved. == Application == The media engagement framework provides a strategic planning construct in which measurements and metrics play a crucial role. Applying the media engagement framework aids in the development and management of an effective online marketing presence leveraging social media to engage a market or audience. By first personifying the audience, the marketer is able to identify the limiting aspect of the engagements possible with that audience segment and then, understand the type of engagement metrics to apply. Each persona makes decisions differently about how he/she acts in the social media universe. A framework metric can be applied for each of these personas: Endorsement funnel for influencers Community engagement funnel for individuals Purchase funnel for consumers Individuals, influencers and consumers make decisions based on alternatives available to them and constraints put on them. To engage with an individual brands must realize they are competing against the time an individual spends on line. If they find something else more engaging, they will engage with that activity. Brands compete against other brands for the purchases of consumers acting in the category. Lastly, influencers have only so many endorsements they can make and therefore brands compete with other endorsers for the endorsement of an influencer. Creating engaging content by keeping target audience in mind like create content that audience find it funny, interesting, and relatable will encourage audience to share it on social networks. Which will be beneficial for you brand, getting more people to know about your business and brand. Contact Digilord to create engaging content for your brand. Use of listening tools (Google Alerts, Twitter Search, SocialMention.com, Veooz.com, Alterian SM2, Radian6, Sysomos, Buzzient etc.) can be employed within the model to help identify the members of the audience segment and to support the formation of other social engagement planning and management tools.

    Read more →
  • Outlook on the web

    Outlook on the web

    Outlook on the web (formerly Outlook Web App and Outlook Web Access) is a personal information manager web app from Microsoft. It is a web-based version of Microsoft Outlook, and is included in Exchange Server and Exchange Online (a component of Microsoft 365). It can be freely accessed from any web browser whether inside or outside an organization's network, and includes a web email client, a calendar tool, a contact manager, and a task manager. It also includes add-in integration, Skype on the web, and alerts as well as unified themes that span across all the web apps. == Purpose == Outlook on the web is available to Microsoft 365 (formerly Office 365) and Exchange Online subscribers, and is included with the on-premises Exchange Server, to enable users to connect to their email accounts via a web browser, without requiring the installation of Microsoft Outlook or other email clients. In case of Exchange Server, it is hosted on a local intranet and requires a network connection to the Exchange Server for users to work with e-mail, address book, calendars and task. The Exchange Online version, which can be bought either independently or through Office 365 licensing program, is hosted on Microsoft servers on the World Wide Web. == History == Outlook Web Access was created in 1995 by Microsoft Program Manager Thom McCann on the Exchange Server team. An early working version was demonstrated by Microsoft Vice President Paul Maritz at Microsoft's famous Internet summit in Seattle on December 27, 1995. The first customer version was shipped as part of the Exchange Server 5.0 release in early 1997. The first component to allow client-side scripts to issue HTTP requests (XMLHTTP) was originally written by the Outlook Web Access team. It soon became a part of Internet Explorer 5. Renamed XMLHttpRequest and standardized by the World Wide Web Consortium, it has since become one of the cornerstones of the Ajax technology used to build advanced web apps. Outlook Web Access was later renamed Outlook Web App in 2010. An update on August 4, 2015, renamed OWA to "Outlook on the web", often referred to in brief as simply "Outlook". == Components == === Mail === Mail is the webmail component of Outlook on the web. The default view is a three column view with folders and groups on the left, an email message list in the middle, and the selected message on the right. With the 2015 update, Microsoft introduced the ability to pin, sweep and archive messages, and undo the last action, as well as richer image editing features. It can connect to other services such as GitHub and Twitter through Office 365 Connectors. Actionable Messages in emails allows a user to complete a task from within the email, such as retweeting a Tweet on Twitter or setting a meeting date on a calendar. Outlook on the web supports S/MIME and includes features for managing calendars, contacts, tasks, documents (used with SharePoint or Office Web Apps), and other mailbox content. In the Exchange 2007 release, Outlook on the web (still called Outlook Web App at the time) also offers read-only access to documents stored in SharePoint sites and network UNC shares. === Calendar === Calendar is the calendaring component of Outlook on the web. With the update, Microsoft added a weather forecast directly in the Calendar, as well as icons (or "charms") as visual cues for an event. In addition, email reminders came to all events, and a special Birthday and Holiday event calendars are created automatically. Calendars can be shared and there are multiple views such as day, week, month, and today. Another view is work week which includes Mondays through Fridays in the calendar view. Calendar's "Board View" feature allows for a customizable calendar with widgets such as Goal, Calendar, Tasks and Tips. Calendar details can be added with HTML and rich-text editing, and files can be attached to calendar events and appointments. === People === People is the contact manager component of Outlook on the web. A user can search and edit existing contacts, as well as create new ones. Contacts can be placed into folders and duplicate contacts can be linked from multiple sources such as LinkedIn or Twitter. In Outlook Mail, a contact can be created by clicking on an email address sender, which pulls down a contact card with an add button to add to Outlook People. Contacts can be imported as well as placed into a list that can be utilized when composing an email in Outlook Mail. People can also sync with friends and connections lists on LinkedIn, Facebook, and Twitter. === To Do === To Do was originally launched as Tasks for Outlook Web App. Microsoft was slowly rolling out a preview of Tasks to its consumer-based Outlook.com service that in May 2015, was announced to be moving to the Office 365 infrastructure. It was initially a part of Calendar as a view. Microsoft has separated the services into its own web app in Outlook on the web. In a post on the Office Blogs in 2015, Microsoft announced that Outlook Web App would be renamed Outlook on the web and that Tasks would move under that brand. A user can create tasks, put them into categories, and move them to another folder. A feature added was the ability to set due days and sort and filter the tasks according to those criteria. The app provides the user with fields such as subject, start and end dates, percent complete, priority, and how much work was put into each task. Rich editing features like bold, italic, underline, numbering, and bullet points were also introduced. Tasks can be edited and categorized according to how the user wishes them to be sorted. == Removed features == Outlook on the web has had two interfaces available: one with a complete feature set (known as Premium) and one with reduced functionality (known as Light or sometimes Lite). Prior to Exchange 2010, the Premium client required Internet Explorer. Exchange 2000 and 2003 require Internet Explorer 5 and later, and Exchange 2007 requires Internet Explorer 6 and later. Exchange 2010 supports a wider range of web browsers: Internet Explorer 7 or later, Firefox 3.01 or later, Chrome, or Safari 3.1 or later. However, Exchange 2010 restricts its Firefox and Safari support to macOS and Linux. In Exchange 2013, these browser restrictions were lifted. In Exchange 2010 and earlier, the Light user interface is rendered for browsers other than Internet Explorer. The basic interface did not support search on Exchange Server 2003. In Exchange Server 2007, the Light interface supported searching mail items; managing contacts and the calendar was also improved. The 2010 version can connect to an external email account. The ability to add new accounts to Outlook on the web using the Connected accounts feature was removed in September 2018 and all connected accounts stopped synchronizing email the following month.

    Read more →
  • Atomicity (database systems)

    Atomicity (database systems)

    In database systems, atomicity (; from Ancient Greek: ἄτομος, romanized: átomos, lit. 'undividable') is the property of a database transaction consisting of an indivisible and irreducible series of database operations such that either all occur, or none occur. It is one of the ACID transaction properties: Atomicity, Consistency, Isolation, Durability. A guarantee of atomicity prevents partial database updates from occurring, because they can cause greater problems than rejecting the whole series outright. As a consequence, an atomic transaction cannot be observed to be in progress by another database client: at one moment in time, it has not yet happened, and at the next it has already occurred in whole (or nothing happened if the transaction was cancelled in progress). An example of transaction atomicity could be a digital monetary transfer from bank account A to account B. It consists of two operations, debiting the money from account A and crediting it to account B. Performing both of these operations inside of an atomic transaction ensures that the database remains in a consistent state, if either operation fails there will not be any unaccountable credits or debits affecting either account. The same term is also used in the definition of First normal form in database systems, where it instead refers to the concept that the values for fields may not consist of multiple smaller values to be decomposed, such as a string into which multiple names, numbers, dates, or other types may be packed. == Orthogonality == Atomicity does not behave completely orthogonally with regard to the other ACID properties of transactions. For example, isolation relies on atomicity to roll back the enclosing transaction in the event of an isolation violation such as a deadlock; consistency also relies on atomicity to roll back the enclosing transaction in the event of a consistency violation by an illegal transaction. As a result of this, a failure to detect a violation and roll back the enclosing transaction may cause an isolation or consistency failure. == Implementation == Typically, systems implement Atomicity by providing some mechanism to indicate which transactions have started and which finished; or by keeping a copy of the data before any changes occurred (Read-copy-update). Several filesystems have developed methods for avoiding the need to keep multiple copies of data, using journaling (see journaling file system). Databases usually implement this using some form of logging/journaling to track changes. The system synchronizes the logs (often the metadata) as necessary after changes have successfully taken place. Afterwards, crash recovery ignores incomplete entries. Although implementations vary depending on factors such as concurrency issues, the principle of atomicity – i.e. complete success or complete failure – remain. Ultimately, any application-level implementation relies on operating-system functionality. At the file-system level, POSIX-compliant systems provide system calls such as open(2) and flock(2) that allow applications to atomically open or lock a file. At the process level, POSIX Threads provide adequate synchronization primitives. The hardware level requires atomic operations such as Test-and-set, Fetch-and-add, Compare-and-swap, or Load-Link/Store-Conditional, together with memory barriers. Portable operating systems cannot simply block interrupts to implement synchronization, since hardware that lacks concurrent execution such as hyper-threading or multi-processing is now extremely rare. In distributed and sharded databases, atomicity is complicated by network latency and the potential for partial failures. While traditional distributed systems often employ locking protocols (like 2PC) to ensure cross-shard atomicity, these can introduce performance bottlenecks. Recent research into distributed ledger consensus suggests alternative models, such as "braided synchronization". This technique, utilized in protocols like Cerberus, intertwines the consensus phases of multiple shards to enforce atomic guarantees without a global ordering of all transactions.

    Read more →
  • Social media mining

    Social media mining

    Social media mining is the process of obtaining data from user-generated content on social media in order to extract actionable patterns, form conclusions about users, and act upon the information. Mining supports targeting advertising to users or academic research. The term is an analogy to the process of mining for minerals. Mining companies sift through raw ore to find the valuable minerals; likewise, social media mining sifts through social media data in order to discern patterns and trends about matters such as social media usage, online behaviour, content sharing, connections between individuals, buying behaviour. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as such organizations can use the analyses for tasks such as design strategies, introduce programs, products, processes or services. Social media mining uses concepts from computer science, data mining, machine learning, and statistics. Mining is based on social network analysis, network science, sociology, ethnography, optimization and mathematics. It attempts to formally represent, measure and model patterns from social media data. In the 2010s, major corporations, governments and not-for-profit organizations began mining to learn about customers, clients and others. Platforms such as Google, Facebook (partnered with Datalogix and BlueKai) conduct mining to target users with advertising. Scientists and machine learning researchers extract insights and design product features. Users may not understand how platforms use their data. Users tend to click through Terms of Use agreements without reading them, leading to ethical questions about whether platforms adequately protect users' privacy. During the 2016 United States presidential election, Facebook allowed Cambridge Analytica, a political consulting firm linked to the Trump campaign, to analyze the data of an estimated 87 million Facebook users to profile voters, creating controversy when this was revealed. == Background == As defined by Kaplan and Haenlein, social media is the "group of internet-based applications that build on the ideological and technological foundations of Web 2.0, and that allow the creation and exchange of user-generated content." There are many categories of social media including, but not limited to, social networking (Facebook or LinkedIn), microblogging (Twitter), photo sharing (Flickr, Instagram, Photobucket, or Picasa), news aggregation (Google Reader, StumbleUpon, or Feedburner), video sharing (YouTube, MetaCafe), livecasting (Ustream or Twitch), virtual worlds (Kaneva), social gaming (World of Warcraft), social search (Google, Bing, or Ask.com), and instant messaging (Google Talk, Skype, or Yahoo! messenger). The first social media website was introduced by GeoCities in 1994. It enabled users to create their own homepages without having a sophisticated knowledge of HTML coding. The first social networking site, SixDegrees.com, was introduced in 1997. Since then, many other social media sites have been introduced, each providing service to millions of people. These individuals form a virtual world in which individuals (social atoms), entities (content, sites, etc.) and interactions (between individuals, between entities, between individuals and entities) coexist. Social norms and human behavior govern this virtual world. By understanding these social norms and models of human behavior and combining them with the observations and measurements of this virtual world, one can systematically analyze and mine social media. Social media mining is the process of representing, analyzing, and extracting meaningful patterns from data in social media, resulting from social interactions. It is an interdisciplinary field encompassing techniques from computer science, data mining, machine learning, social network analysis, network science, sociology, ethnography, statistics, optimization, and mathematics. Social media mining faces grand challenges such as the big data paradox, obtaining sufficient samples, the noise removal fallacy, and evaluation dilemma. Social media mining represents the virtual world of social media in a computable way, measures it, and designs models that can help us understand its interactions. In addition, social media mining provides necessary tools to mine this world for interesting patterns, analyze information diffusion, study influence and homophily, provide effective recommendations, and analyze novel social behavior in social media. == Uses == Social media mining is used across several industries including business development, social science research, health services, and educational purposes. Once the data received goes through social media analytics, it can then be applied to these various fields. Often, companies use the patterns of connectivity that pervade social networks, such as assortativity—the social similarity between users that are induced by influence, homophily, and reciprocity and transitivity. These forces are then measured via statistical analysis of the nodes and connections between these nodes. Social analytics also uses sentiment analysis, because social media users often relay positive or negative sentiment in their posts. This provides important social information about users' emotions on specific topics. These three patterns have several uses beyond pure analysis. For example, influence can be used to determine the most influential user in a particular network. Companies would be interested in this information in order to decide who they may hire for influencer marketing. These influencers are determined by recognition, activity generation, and novelty—three requirements that can be measured through the data mined from these sites. Analysts also value measures of homophily: the tendency of two similar individuals to become friends. Users have begun to rely on information of other users' opinions in order to understand diverse subject matter. These analyses can also help create recommendations for individuals in a tailored capacity. By measuring influence and homophily, online and offline companies are able to suggest specific products for individuals consumers, and groups of consumers. Social media networks can use this information themselves to suggest to their users possible friends to add, pages to follow, and accounts to interact with. == Perception == Modern social media mining is a controversial practice that has led to exponential gains in user growth for tech giants such as Facebook, Inc., Twitter, and Google. Companies such as these, considered "Big Tech" are companies that build algorithms that take advantage of user input to understand their preferences, and keep them on the platform as much as possible. These inputs, that can be as simple as time spent on a given screen, provide the data being mined, and lead to companies profiting heavily from using that data to capitalize on extremely accurate predictions about user behavior. The growth of platforms accelerated rapidly once these strategies were put in place; Most of the largest platforms now average over 1 billion active users per month as of 2021. It has been claimed by a multitude of anti-algorithm personalities, like Tristan Harris or Chamath Palihapitiya, that certain companies (specifically Facebook) valued growth above all else, and ignored potential negative impacts from these growth engineering tactics. At the same time, users have now created their own data arbitrages with the help of their own data, through content monetization and becoming influencers. Users typically have access to a varied set of analytics specific to people that interact with them on social media, and can use these as building blocks for their own targeting and growth strategies through ads and posts that cater to their audiences. Influencers also commonly promote products and services for established brands, creating one of the largest digital industries: Influencer marketing. Instagram, Facebook, Twitter, YouTube, Google, and others have long given access to platform analytics, and allowed third parties to access that information as well, at times unbeknownst to even the user whose data is being viewed/bought. == Research == === Research areas === Social media event detection – Social networks enable users to freely communicate with each other and share their recent news, ongoing activities or views about different topics. As a result, they can be seen as a potentially viable source of information to understand the current emerging topics/events. Public health monitoring and surveillance - Using large-scale analysis of social media to study large cohorts of patients and the general public, e.g. to obtain early warning signals of drug-drug interactions and adverse drug reactions, or understand human reproduction and sexual interest. Community structure (Community Detection/Evolution/Evaluation) – Identifying communities on social networks, how t

    Read more →