In statistics, the relationship square is a graphical representation for use in the factorial analysis of a table individuals x variables. This representation completes classical representations provided by principal component analysis (PCA) or multiple correspondence analysis (MCA), namely those of individuals, of quantitative variables (correlation circle) and of the categories of qualitative variables (at the centroid of the individuals who possess them). It is especially important in factor analysis of mixed data (FAMD) and in multiple factor analysis (MFA). == Definition of relationship square in the MCA frame == The first interest of the relationship square is to represent the variables themselves, not their categories, which is all the more valuable as there are many variables. For this, we calculate for each qualitative variable j {\displaystyle j} and each factor F s {\displaystyle F_{s}} ( F s {\displaystyle F_{s}} , rank s {\displaystyle s} factor, is the vector of coordinates of the individuals along the axis of rank s {\displaystyle s} ; in PCA, F s {\displaystyle F_{s}} is called principal component of rank s {\displaystyle s} ), the square of the correlation ratio between the F s {\displaystyle F_{s}} and the variable j {\displaystyle j} , usually denoted : η 2 ( j , F s ) {\displaystyle \eta ^{2}(j,F_{s})} Thus, to each factorial plane, we can associate a representation of qualitative variables themselves. Their coordinates being between 0 and 1, the variables appear in the square having as vertices the points (0,0), ( 0,1), (1,0) and (1,1). == Example in MCA == Six individuals ( i 1 , … , i 6 ) {\displaystyle i_{1},\ldots ,i_{6})} are described by three variables ( q 1 , q 2 , q 3 ) {\displaystyle (q_{1},q_{2},q_{3})} having respectively 3, 2 and 3 categories. Example : the individual i 1 {\displaystyle i_{1}} possesses the category a {\displaystyle a} of q 1 {\displaystyle q_{1}} , d {\displaystyle d} of q 2 {\displaystyle q_{2}} and f {\displaystyle f} of q 3 {\displaystyle q_{3}} . Applied to these data, the MCA function included in the R Package FactoMineR provides to the classical graph in Figure 1. The relationship square (Figure 2) makes easier the reading of the classic factorial plane. It indicates that: The first factor is related to the three variables but especially q 3 {\displaystyle q_{3}} (which have a very high coordinate along the first axis) and then q 2 {\displaystyle q_{2}} . The second factor is related only to q 1 {\displaystyle q_{1}} and q 3 {\displaystyle q_{3}} (and not to q 2 {\displaystyle q_{2}} which has a coordinate along axis 2 equal to 0) and that in a strong and equal manner. All this is visible on the classic graphic but not so clearly. The role of the relationship square is first to assist in reading a conventional graphic. This is precious when the variables are numerous and possess numerous coordinates. == Extensions == This representation may be supplemented with those of quantitative variables, the coordinates of the latter being the square of correlation coefficients (and not of correlation ratios). Thus, the second advantage of the relationship square lies in the ability to represent simultaneously quantitative and qualitative variables. The relationship square can be constructed from any factorial analysis of a table individuals x variables. In particular, it is (or should be) used systematically: in multiple correspondences analysis (MCA); in principal components analysis (PCA) when there are many supplementary variables; in factor analysis of mixed data (FAMD). An extension of this graphic to groups of variables (how to represent a group of variables by a single point ?) is used in Multiple Factor Analysis (MFA) == History == The idea of representing the qualitative variables themselves by a point (and not the categories) is due to Brigitte Escofier. The graphic as it is used now has been introduced by Brigitte Escofier and Jérôme Pagès in the framework of multiple factor analysis == Conclusion == In MCA, the relationship square provides a synthetic view of the connections between mixed variables, all the more valuable as there are many variables having many categories. This representation iscan be useful in any factorial analysis when there are numerous mixed variables, active and/or supplementary.
Time-compressed speech
Time-compressed speech refers to an audio recording of verbal text in which the text is presented in a much shorter time interval than it would through normally-paced real time speech. The basic purpose is to make recorded speech contain more words in a given time, yet still be understandable. For example: a paragraph that might normally be expected to take 20 seconds to read, might instead be presented in 15 seconds, which would represent a time-compression of 25% (5 seconds out of 20). The term "time-compressed speech" should not be confused with "speech compression", which controls the volume range of a sound, but does not alter its time envelope. == Methods == While some voice talents are capable of speaking at rates significantly in excess of general norms, the term "time-compressed speech" most usually refers to examples in which the time-reduction has been accomplished through some form of electronic processing of the recorded speech. In general, recorded speech can be electronically time-compressed by: increasing its speed (linear compression); removing silences (selective editing); a combination of the two (non-linear compression). The speed of a recording can be increased, which will cause the material to be presented at a faster rate (and hence in a shorter amount of time), but this has the undesirable side-effect of increasing the frequency of the whole passage, raising the pitch of the voices, which can reduce intelligibility. There are normally silences between words and sentences, and even small silences within certain words, both of which can be reduced or removed ("edited-out") which will also reduce the amount of time occupied by the full speech recording. However, this can also have the effect of removing verbal "punctuation" from the speech, causing words and sentences to run together unnaturally, again reducing intelligibility. Vowels are typically held a minimum of 20 milliseconds, over many cycles of the fundamental pitch. DSP systems can detect the beginning and end of each cycle and then skip over some fraction of those cycles, causing the material to be presented at a faster rate, without changing the pitch, maintaining a "normal" tone of voice. The current preferred method of time-compression is called "non-linear compression", which employs a combination of selectively removing silences; speeding up the speech to make the reduced silences sound normally-proportioned to the text; and finally applying various data algorithms to bring the speech back down to the proper pitch. This produces a more acceptable result than either of the two earlier techniques; however, if unrestrained, removing the silences and increasing the speed can make a selection of speech sound more insistent, possibly to the point of unpleasantness. == Applications == === Advertising === Time-compressed speech is frequently used in television and radio advertising. The advantage of time-compressed speech is that the same number of words can be compressed into a smaller amount of time, reducing advertising costs, and/or allowing more information to be included in a given radio or TV advertisement. It is usually most noticeable in the information-dense caveats and disclaimers presented (usually by legal requirement) at the end of commercials—the aural equivalent of the "fine print" in a printed contract. This practice, however, is not new: before electronic methods were developed, spokespeople who could talk extremely quickly and still be understood were widely used as voice talents for radio and TV advertisements, and especially for recording such disclaimers. === Education === Time-compressed speech has educational applications such as increasing the information density of trainings, and as a study aid. A number of studies have demonstrated that the average person is capable of relatively easily comprehending speech delivered at higher-than-normal rates, with the peak occurring at around 25% compression (that is, 25% faster than normal); this facility has been demonstrated in several languages. Conversational speech (in English) takes place at a rate of around 150 wpm (words per minute), but the average person is able to comprehend speech presented at rates of up to 200-250 wpm without undue difficulty. Blind and severely visually impaired subjects scored similar comprehension levels at even higher rates, up to 300-350 wpm. Blind people have been found to use time-compressed speech extensively, for example, when reviewing recorded lectures from high school and college classes, or professional trainings. Comprehension rates in older blind subjects have been found to be as good, or in some cases better than those found in younger sighted subjects. Other studies have determined that the ability to comprehend highly time-compressed speech tends to fall off with increased age, and is also reduced when the language of the time-compressed speech is not the listener's native language. Non-native speakers can, however, improve their comprehension level of time-compressed speech with multiday training. === Voice Mail === Voice mail systems have employed time-compressed speech since as far back as the 1970s. In this application, the technology enables the rapid review of messages in high-traffic systems, by a relatively small number of people. === Streaming Multimedia === Time-compressed speech has been explored as one of a variety of interrelated factors which may be manipulated to increase the efficiency of streaming multimedia presentations, by significantly reducing the latency times involved in the transfer of large digitally encoded media files.
Xinhua–Sogou AI news anchor
Xinhua News Agency and Sogou of China developed an artificial intelligence (AI) for news reporting purposes. The AI was unveiled in 2018. It is touted to be the "world's first AI news anchor". == History == The AI was unveiled at the 2018 World Internet Conference in Wuzhen, Zhejiang, China. The AI devises avatars patterned after real life Xinhua anchors. The AI patterned after Qiu Hao spoke in Chinese, while the one derived from the likeness of Zhang Zhao speaks in English. The unveiling of the AI raised concerns of its impact on employment. Xinhua and Sogou unveiled Xin Xiaomeng, an AI with a female avatar in 2019. People's Daily followed suit by unveiling its own AI newscaster in 2023.
Protégé (software)
Protégé is a free, open source ontology editor and a knowledge management system. The Protégé meta-tool was first built by Mark Musen in 1987 and has since been developed by a team at Stanford University. The software is the most popular and widely used ontology editor in the world. == Overview == Protégé provides a graphical user interface to define ontologies. It also includes deductive classifiers to validate that models are consistent and to infer new information based on the analysis of an ontology. Like Eclipse, Protégé is a framework for which various other projects suggest plugins. This application is written in Java and makes heavy use of Swing to create the user interface. According to their website, there are over 300,000 registered users. A 2009 book calls it "the leading ontological engineering tool". Protégé is developed at Stanford University and is made available under the BSD 2-clause license. Earlier versions of the tool were developed in collaboration with the University of Manchester.
Ethics of artificial intelligence
The ethics of artificial intelligence covers a broad range of topics within AI that are considered to have particular ethical stakes. This includes algorithmic biases, fairness, accountability, transparency, privacy, and regulation, particularly where systems influence or automate human decision-making. It also covers various emerging or potential future challenges such as machine ethics (how to make machines that behave ethically), lethal autonomous weapon systems, arms race dynamics, AI safety and alignment, technological unemployment, AI-enabled misinformation, how to treat certain AI systems if they have a moral status (AI welfare and rights), artificial superintelligence and existential risks. Some application areas may also have particularly important ethical implications, like healthcare, education, criminal justice, or the military. == Machine ethics == Machine ethics (or machine morality) is the field of research concerned with designing Artificial Moral Agents (AMAs), robots or artificially intelligent computers that behave morally or as though moral. To account for the nature of these agents, it has been suggested to consider certain philosophical ideas, like the standard characterizations of agency, rational agency, moral agency, and artificial agency, which are related to the concept of AMAs. There are discussions on creating tests to see if an AI is capable of making ethical decisions. Alan Winfield concludes that the Turing test is flawed and the requirement for an AI to pass the test is too low. A proposed alternative test is one called the Ethical Turing Test, which would improve on the current test by having multiple judges decide if the AI's decision is ethical or unethical. Neuromorphic AI could be one way to create morally capable robots, as it aims to process information similarly to humans, nonlinearly and with millions of interconnected artificial neurons. Similarly, whole-brain emulation (scanning a brain and simulating it on digital hardware) could also in principle lead to human-like robots, thus capable of moral actions. And large language models are capable of approximating human moral judgments. Inevitably, this raises the question of the environment in which such robots would learn about the world and whose morality they would inherit – or if they end up developing human 'weaknesses' as well: selfishness, pro-survival attitudes, inconsistency, scale insensitivity, etc. In Moral Machines: Teaching Robots Right from Wrong, Wendell Wallach and Colin Allen conclude that attempts to teach robots right from wrong will likely advance understanding of human ethics by motivating humans to address gaps in modern normative theory and by providing a platform for experimental investigation. As one example, it has introduced normative ethicists to the controversial issue of which specific learning algorithms to use in machines. For simple decisions, Nick Bostrom and Eliezer Yudkowsky have argued that decision trees (such as ID3) are more transparent than neural networks and genetic algorithms, while Chris Santos-Lang argued in favor of machine learning on the grounds that the norms of any age must be allowed to change and that natural failure to fully satisfy these particular norms has been essential in making humans less vulnerable to criminal "hackers". Some researchers frame machine ethics as part of the broader AI control or value alignment problem: the difficulty of ensuring that increasingly capable systems pursue objectives that remain compatible with human values and oversight. Stuart Russell has argued that beneficial systems should be designed to (1) aim at realizing human preferences, (2) remain uncertain about what those preferences are, and (3) learn about them from human behaviour and feedback, rather than optimizing a fixed, fully specified goal. Some authors argue that apparent compliance with human values may reflect optimization for evaluation contexts rather than stable internal norms, complicating the assessment of alignment in advanced language models. == Challenges == === Algorithmic biases === AI has become increasingly inherent in facial and voice recognition systems. These systems may be vulnerable to biases and errors introduced by their human creators. Notably, the data used to train them can have biases. According to Allison Powell, associate professor at LSE and director of the Data and Society programme, data collection is never neutral and always involves storytelling. She argues that the dominant narrative is that governing with technology is inherently better, faster and cheaper, but proposes instead to make data expensive, and to use it both minimally and valuably, with the cost of its creation factored in. Friedman and Nissenbaum identify three categories of bias in computer systems: existing bias, technical bias, and emergent bias. In natural language processing, problems can arise from the text corpus—the source material the algorithm uses to learn about the relationships between different words. Large companies such as IBM, Google, etc. that provide significant funding for research and development have made efforts to research and address these biases. One potential solution is to create documentation for the data used to train AI systems. Process mining can be an important tool for organizations to achieve compliance with proposed AI regulations by identifying errors, monitoring processes, identifying potential root causes for improper execution, and other functions. However, there are also limitations to the current landscape of fairness in AI, due to the intrinsic ambiguities in the concept of discrimination, both at the philosophical and legal level. ==== Racial and gender biases ==== Bias can be introduced through historical data used to train AI systems. For instance, Amazon terminated their use of AI hiring and recruitment because the algorithm favored male candidates over female ones. This was because Amazon's system was trained with data collected over a 10-year period that included mostly male candidates. The algorithms learned the biased pattern from the historical data, and generated predictions where these types of candidates were most likely to succeed in getting the job. Therefore, the recruitment decisions made by the AI system turned out to be biased against female and minority candidates. The performance of facial recognition and computer vision models may vary based on race and gender. Facial recognition algorithms made by Microsoft, IBM and Face++ all performed significantly worse on darker-skinned women. Facial recognition was shown to be biased against those with darker skin tones. AI systems may be less accurate for black people, as was the case in the development of an AI-based pulse oximeter that overestimated blood oxygen levels in patients with darker skin, causing issues with their hypoxia treatment. In 2015, controversy erupted after a Black couple were labeled "Gorillas" by Google Photos. Oftentimes the systems are able to easily detect the faces of white people while being unable to register the faces of people who are black. This has led to the ban of police usage of AI materials or software in some U.S. states. The reason for these biases is that AI pulls information from across the internet to influence its responses in each situation. For example, if a facial recognition system was only tested on people who were white, it would make it much harder for it to interpret the facial structure and tones of other races and ethnicities. Biases often stem from the training data rather than the algorithm itself, notably when the data represents past human decisions. A 2020 study that reviewed voice recognition systems from Amazon, Apple, Google, IBM, and Microsoft found that they have higher error rates when transcribing black people's voices than white people's. Injustice in the use of AI is much harder to eliminate within healthcare systems, as oftentimes diseases and conditions can affect different races and genders differently. This can lead to confusion as the AI may be making decisions based on statistics showing that one patient is more likely to have problems due to their gender or race. This can be perceived as a bias because each patient is a different case, and AI is making decisions based on what it is programmed to group that individual into. This leads to a discussion about what should be considered a biased decision in the distribution of treatment. While it is known that there are differences in how diseases and injuries affect different genders and races, there is a discussion on whether it is fairer to incorporate this into healthcare treatments, or to examine each patient without this knowledge. In modern society there are certain tests for diseases, such as breast cancer, that are recommended to certain groups of people over others because they are more likely to contract the disease in question. If AI implements these statistics
Language engineering
Language engineering involves the creation of natural language processing systems, whose cost and outputs are measurable and predictable. It is a distinct field contrasted to natural language processing and computational linguistics. A recent trend of language engineering is the use of Semantic Web technologies for the creation, archiving, processing, and retrieval of machine processable language data. Meta-Language Engineering is a proposed extension of Language Engineering first recorded in 2025, associated with the work of Delyone de Paula Canedo Filho. The term is used to designate an approach that, in addition to natural language processing, encompasses the symbolic, cognitive, and epistemological structuring of language systems.
User modeling
User modeling is the subdivision of human–computer interaction which describes the process of building up and modifying a conceptual understanding of the user. The main goal of user modeling is customization and adaptation of systems to the user's specific needs. The system needs to "say the 'right' thing at the 'right' time in the 'right' way". To do so it needs an internal representation of the user. Another common purpose is modeling specific kinds of users, including modeling of their skills and declarative knowledge, for use in automatic software-tests. User-models can thus serve as a cheaper alternative to user testing but should not replace user testing. == Background == A user model is the collection and categorization of personal data associated with a specific user. A user model is a (data) structure that is used to capture certain characteristics about an individual user, and a user profile is the actual representation in a given user model. The process of obtaining the user profile is called user modeling. Therefore, it is the basis for any adaptive changes to the system's behavior. Which data is included in the model depends on the purpose of the application. It can include personal information such as users' names and ages, their interests, their skills and knowledge, their goals and plans, their preferences and their dislikes or data about their behavior and their interactions with the system. There are different design patterns for user models, though often a mixture of them is used. Static user models Static user models are the most basic kinds of user models. Once the main data is gathered they are normally not changed again, they are static. Shifts in users' preferences are not registered and no learning algorithms are used to alter the model. Dynamic user models Dynamic user models allow a more up to date representation of users. Changes in their interests, their learning progress or interactions with the system are noticed and influence the user models. The models can thus be updated and take the current needs and goals of the users into account. Stereotype based user models Stereotype based user models are based on demographic statistics. Based on the gathered information users are classified into common stereotypes. The system then adapts to this stereotype. The application therefore can make assumptions about a user even though there might be no data about that specific area, because demographic studies have shown that other users in this stereotype have the same characteristics. Thus, stereotype based user models mainly rely on statistics and do not take into account that personal attributes might not match the stereotype. However, they allow predictions about a user even if there is rather little information about him or her. Highly adaptive user models Highly adaptive user models try to represent one particular user and therefore allow a very high adaptivity of the system. In contrast to stereotype based user models they do not rely on demographic statistics but aim to find a specific solution for each user. Although users can take great benefit from this high adaptivity, this kind of model needs to gather a lot of information first. == Data gathering == Information about users can be gathered in several ways. There are three main methods: Asking for specific facts while (first) interacting with the system Mostly this kind of data gathering is linked with the registration process. While registering users are asked for specific facts, their likes and dislikes and their needs. Often the given answers can be altered afterwards. Learning users' preferences by observing and interpreting their interactions with the system In this case users are not asked directly for their personal data and preferences, but this information is derived from their behavior while interacting with the system. The ways they choose to accomplish a tasks, the combination of things they takes interest in, these observations allow inferences about a specific user. The application dynamically learns from observing these interactions. Different machine learning algorithms may be used to accomplish this task. A hybrid approach which asks for explicit feedback and alters the user model by adaptive learning This approach is a mixture of the ones above. Users have to answer specific questions and give explicit feedback. Furthermore, their interactions with the system are observed and the derived information are used to automatically adjust the user models. Though the first method is a good way to quickly collect main data it lacks the ability to automatically adapt to shifts in users' interests. It depends on the users' readiness to give information and it is unlikely that they are going to edit their answers once the registration process is finished. Therefore, there is a high likelihood that the user models are not up to date. However, this first method allows the users to have full control over the collected data about them. It is their decision which information they are willing to provide. This possibility is missing in the second method. Adaptive changes in a system that learns users' preferences and needs only by interpreting their behavior might appear a bit opaque to the users, because they cannot fully understand and reconstruct why the system behaves the way it does. Moreover, the system is forced to collect a certain amount of data before it is able to predict the users' needs with the required accuracy. Therefore, it takes a certain learning time before a user can benefit from adaptive changes. However, afterwards these automatically adjusted user models allow a quite accurate adaptivity of the system. The hybrid approach tries to combine the advantages of both methods. Through collecting data by directly asking its users it gathers a first stock of information which can be used for adaptive changes. By learning from the users' interactions it can adjust the user models and reach more accuracy. Yet, the designer of the system has to decide, which of these information should have which amount of influence and what to do with learned data that contradicts some of the information given by a user. == System adaptation == Once a system has gathered information about a user it can evaluate that data by preset analytical algorithm and then start to adapt to the user's needs. These adaptations may concern every aspect of the system's behavior and depend on the system's purpose. Information and functions can be presented according to the user's interests, knowledge or goals by displaying only relevant features, hiding information the user does not need, making proposals what to do next and so on. One has to distinguish between adaptive and adaptable systems. In an adaptable system the user can manually change the system's appearance, behavior or functionality by actively selecting the corresponding options. Afterwards the system will stick to these choices. In an adaptive system a dynamic adaption to the user is automatically performed by the system itself, based on the built user model. Thus, an adaptive system needs ways to interpret information about the user in order to make these adaptations. One way to accomplish this task is implementing rule-based filtering. In this case a set of IF... THEN... rules is established that covers the knowledge base of the system. The IF-conditions can check for specific user-information and if they match the THEN-branch is performed which is responsible for the adaptive changes. Another approach is based on collaborative filtering. In this case information about a user is compared to that of other users of the same systems. Thus, if characteristics of the current user match those of another, the system can make assumptions about the current user by presuming that he or she is likely to have similar characteristics in areas where the model of the current user is lacking data. Based on these assumption the system then can perform adaptive changes. == Usages == Adaptive hypermedia: In an adaptive hypermedia system the displayed content and the offered hyperlinks are chosen on basis of users' specific characteristics, taking their goals, interests, knowledge and abilities into account. Thus, an adaptive hypermedia system aims to reduce the "lost in hyperspace" syndrome by presenting only relevant information. Adaptive educational hypermedia: Being a subdivision of adaptive hypermedia the main focus of adaptive educational hypermedia lies on education, displaying content and hyperlinks corresponding to the user's knowledge on the field of study. Intelligent tutoring system: Unlike adaptive educational hypermedia systems intelligent tutoring systems are stand-alone systems. Their aim is to help students in a specific field of study. To do so, they build up a user model where they store information about abilities, knowledge and needs of the user. The system can now adapt to this user by presenting approp