AI Face Lift

AI Face Lift — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Information extraction

    Information extraction

    Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. Typically, this involves processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video/documents could be seen as information extraction. Recent advances in NLP techniques have allowed for significantly improved performance compared to previous years. An example is the extraction from newswire reports of corporate mergers, such as denoted by the formal relation: MergerBetween ⁡ ( c o m p a n y 1 , c o m p a n y 2 , d a t e ) {\displaystyle \operatorname {MergerBetween} (\mathrm {company} _{1},\mathrm {company} _{2},\mathrm {date} )} , from an online news sentence such as: "Yesterday, New York based Foo Inc. announced their acquisition of Bar Corp." A broad goal of IE is to allow computation to be done on the previously unstructured data. A more specific goal is to allow automated reasoning about the logical form of the input data. Structured data is semantically well-defined data from a chosen target domain, interpreted with respect to category and context. Information extraction is the part of a greater puzzle which deals with the problem of devising automatic methods for text management, beyond its transmission, storage and display. The discipline of information retrieval (IR) has developed automatic methods, typically of a statistical flavor, for indexing large document collections and classifying documents. Another complementary approach is that of natural language processing (NLP) which has solved the problem of modelling human language processing with considerable success when taking into account the magnitude of the task. In terms of both difficulty and emphasis, IE deals with tasks in between both IR and NLP. In terms of input, IE assumes the existence of a set of documents in which each document follows a template, i.e. describes one or more entities or events in a manner that is similar to those in other documents but differing in the details. An example, consider a group of newswire articles on Latin American terrorism with each article presumed to be based upon one or more terroristic acts. We also define for any given IE task a template, which is a(or a set of) case frame(s) to hold the information contained in a single document. For the terrorism example, a template would have slots corresponding to the perpetrator, victim, and weapon of the terroristic act, and the date on which the event happened. An IE system for this problem is required to "understand" an attack article only enough to find data corresponding to the slots in this template. == History == Information extraction dates back to the late 1970s in the early days of NLP. An early commercial system from the mid-1980s was JASPER built for Reuters by the Carnegie Group Inc with the aim of providing real-time financial news to financial traders. Beginning in 1987, IE was spurred by a series of Message Understanding Conferences. MUC is a competition-based conference that focused on the following domains: MUC-1 (1987), MUC-3 (1989): Naval operations messages. MUC-3 (1991), MUC-4 (1992): Terrorism in Latin American countries. MUC-5 (1993): Joint ventures and microelectronics domain. MUC-6 (1995): News articles on management changes. MUC-7 (1998): Satellite launch reports. Considerable support came from the U.S. Defense Advanced Research Projects Agency (DARPA), who wished to automate mundane tasks performed by government analysts, such as scanning newspapers for possible links to terrorism. == Present significance == The present significance of IE pertains to the growing amount of information available in unstructured form. Tim Berners-Lee, inventor of the World Wide Web, refers to the existing Internet as the web of documents and advocates that more of the content be made available as a web of data. Until this transpires, the web largely consists of unstructured documents lacking semantic metadata. Knowledge contained within these documents can be made more accessible for machine processing by means of transformation into relational form, or by marking-up with XML tags. An intelligent agent monitoring a news data feed requires IE to transform unstructured data into something that can be reasoned with. A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted. == Tasks and subtasks == Applying information extraction to text is linked to the problem of text simplification in order to create a structured view of the information present in free text. The overall goal being to create a more easily machine-readable text to process the sentences. Typical IE tasks and subtasks include: Template filling: Extracting a fixed set of fields from a document, e.g. extract perpetrators, victims, time, etc. from a newspaper article about a terrorist attack. Event extraction: Given an input document, output zero or more event templates. For instance, a newspaper article might describe multiple terrorist attacks. Knowledge Base Population: Fill a database of facts given a set of documents. Typically the database is in the form of triplets, (entity 1, relation, entity 2), e.g. (Barack Obama, Spouse, Michelle Obama) Named entity recognition: recognition of known entity names (for people and organizations), place names, temporal expressions, and certain types of numerical expressions, by employing existing knowledge of the domain or information extracted from other sentences. Typically the recognition task involves assigning a unique identifier to the extracted entity. A simpler task is named entity detection, which aims at detecting entities without having any existing knowledge about the entity instances. For example, in processing the sentence "M. Smith likes fishing", named entity detection would denote detecting that the phrase "M. Smith" does refer to a person, but without necessarily having (or using) any knowledge about a certain M. Smith who is (or, "might be") the specific person whom that sentence is talking about. Coreference resolution: detection of coreference and anaphoric links between text entities. In IE tasks, this is typically restricted to finding links between previously extracted named entities. For example, "International Business Machines" and "IBM" refer to the same real-world entity. If we take the two sentences "M. Smith likes fishing. But he doesn't like biking", it would be beneficial to detect that "he" is referring to the previously detected person "M. Smith". Relationship extraction: identification of relations between entities, such as: PERSON works for ORGANIZATION (extracted from the sentence "Bill works for IBM.") PERSON located in LOCATION (extracted from the sentence "Bill is in France.") Semi-structured information extraction which may refer to any IE that tries to restore some kind of information structure that has been lost through publication, such as: Table extraction: finding and extracting tables from documents. Table information extraction : extracting information in structured manner from the tables. This task is more complex than table extraction, as table extraction is only the first step, while understanding the roles of the cells, rows, columns, linking the information inside the table and understanding the information presented in the table are additional tasks necessary for table information extraction. Comments extraction : extracting comments from the actual content of articles in order to restore the link between authors of each of the sentences Language and vocabulary analysis Terminology extraction: finding the relevant terms for a given corpus Audio extraction Template-based music extraction: finding relevant characteristic in an audio signal taken from a given repertoire; for instance time indexes of occurrences of percussive sounds can be extracted in order to represent the essential rhythmic component of a music piece. Note that this list is not exhaustive and that the exact meaning of IE activities is not commonly accepted and that many approaches combine multiple sub-tasks of IE in order to achieve a wider goal. Machine learning, statistical analysis and/or natural language processing are often used in IE. IE on non-text documents is becoming an increasingly interesting topic in research, and information extracted from multimedia documents can now be expressed in a high level structure as it is done on text. This naturally leads to the fusion of extracted information from multiple kinds of documents and sources. == World Wide Web applications == IE has been the focus of the MUC conferences. The proliferation of the Web, however, intensified the need for developing IE systems that help people

    Read more →
  • Small data

    Small data

    Small data is data that is 'small' enough for human comprehension. It is data in a volume and format that makes it accessible, informative and actionable. The term "big data" is about machines and "small data" is about people. This is to say that eyewitness observations or five pieces of related data could be small data. Small data is what we used to think of as data. The only way to comprehend Big data is to reduce the data into small, visually-appealing objects representing various aspects of large data sets (such as histogram, charts, and scatter plots). Big Data is all about finding correlations, but Small Data is all about finding the causation, the reason why. A formal definition of small data has been proposed by Allen Bonde, former vice-president of Innovation at Actuate - now part of OpenText: "Small data connects people with timely, meaningful insights (derived from big data and/or “local” sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks." Another definition of small data is: The small set of specific attributes produced by the Internet of Things. These are typically a small set of sensor data such as temperature, wind speed, vibration and status. It was estimated (2016) that “If one takes the top 100 biggest innovations of our time, perhaps around 60% to 65% percent are really based on Small Data.” as Martin Lindstrom puts it. Small data includes everything from Snapchat to simple objects such as the post-it note. Lindstrom believes we become so focused on Big-Data that we tend to forget about more basic concepts and creativity. Lindstrom defines Small Data "as seemingly insignificant observations you identify in consumers’ homes, is everything from how you place your shoes on how you hang your paintings". He thus considers that one should perfectly master the basic (Small Data) in order to mine and find correlations. == Academic Recognition and Methodology == The growing significance of "small data" as a distinct field of inquiry was highlighted by the 2024 Thematic Einstein Semester (TES) on Small Data Analysis, hosted by the Berlin Mathematics Research Center MATH+. A central focus of this semester was the transition from theoretical analysis to practical decision-making. Because small data sets are primarily used to drive specific actions, the presentation of results becomes an essential methodological step. The semester’s findings emphasized that while small data may lack volume, it often contains a high density of "many possible interpretations." Consequently, the final conference of the TES was structured around the pillars of interpretation, explanation, and knowledge gain. Participants sought to develop new mathematical and methodical representations that could accurately depict this wealth of interpretative possibilities. This work underscores that analyzing small data is not purely a computational task; it requires a robust interface between mathematics and diverse disciplines to ensure that insights are both contextually grounded and scientifically rigorous. == Uses in business == === Marketing === Bonde has written about the topic for Forbes, Direct Marketing News, CMO.com and other publications. According to Martin Lindstrom, in his book, Small Data: "{In customer research, small data is} Seemingly insignificant behavioural observations containing very specific attributes pointing towards an unmet customer need. Small data is the foundation for breakthrough ideas or completely new ways to turnaround brands." His approach is based on the combination of the observation of small samples with intuition. Marketers can obtain market insights from gathering Small Data by engaging with and observing people in their own environments. In comparison to Big Data, Small Data has the power to trigger emotions and to provide insights into the reasons behind the behaviours of customers. It may uncover detailed information on a person's extroversion or introversion, self-confidence, whether one is having problems in his/her relationship, etc. According to Lindstrom, relationships among people and customer segments are organized around four criteria: Climate: It reveals for example how a person's environment affects their diet. Rulership: The power or government in charge Religion: The prevalence of religion in a country, depending on its influence, indicates whether a person's decision making process is impacted by their belief system. Tradition: Cultural norms influence people's behaviors and interactions. Many companies underestimate the power of Small Data, using samples of millions of consumers instead of recognizing the value of closely observing small samples in their market research. In his book, Lindstrom defines "7Cs", which companies should consider in the attempt to derive meaningful customer insights and market trends through small data from their customers: Collecting: Understanding the manner in which observations are translated inside a home. Clues: Uncovering other distinctive emotional reflections that can be observed. Connecting: Identifying the consequences of emotional behaviour. Causation: Understanding what emotions are being evoked. Correlation: Identifying the initial date of appearance of the behaviour or emotion. Compensation: Identifying the unmet or unfulfilled desire. Concept: Defining the “big idea” compensation for the identified consumer need. Some of Lindstrom's clients such as Lowes Foods looked at data in a different way and actually chose to live with the customer. “As you enter their store, they have now created an amazing community where every staff member acts in a character mood, based on Small Data”. The supermarket made everything it can to make the customer feel at home. All the behaviours of employees are inspired by customer feedbacks gathered from interviews directly done at customer’s home. === Healthcare === Researchers at Cornell University started developing applications to monitor health problems in patients, based on small data. This is an initiative of Cornell's Small Data Lab, in close cooperation with Weill Cornell Medicine College, led by Deborah Estrin. The Small Data Lab developed a series of apps, focusing not only on gathering data from patients' pain but also tracking habits in areas such as grocery shopping. In the case of patients with rheumatoid arthritis for example, which has flares and remissions that do not follow a particular cycle, the app gathers information passively, thus allowing to forecast when a flare might be coming up based on small changes in behaviour. Other apps developed also include monitoring online grocery shopping, to use this information from every user to adapt their groceries to the recommendations of nutritionists, or monitoring email language to identify patterns that might indicate "fluctuations in cognitive performance, fatigue, side effects of medication or poor sleep, and other conditions and treatments that are typically self-reported and self-medicated". === Postal Service === The United States Postal Service (USPS) used optical character recognition (OCR) to automatically read and process 98% of all hand-addressed mail and 99.5% of machine-printed mail. By combining this technology with its small data sample of US zip codes, the USPS can now process more than 36,000 pieces of mail per hour. === Aerospace === In 2015, Boeing established the analytics lab for aerospace data in cooperation with the Carnegie Mellon University to leverage the university's leadership in machine learning, language technologies and data analytics. One of the initiatives projects aims to by standardize maintenance logs using AI to dramatically reduce costs. Currently, there is no standardized procedure to document maintenance logs leading to small but highly unstructured data sets. As a result, it becomes highly difficult for maintenance workers to translate these variations in maintenance logs within a short period of time. However, with AI and a narrow data set of common aircraft maintenance terminology, it becomes possible to dynamically translate these logs in real time. By using AI to enhance the speed and accuracy of the airline maintenance workflow, airlines stand to save billions according to the Harvard Business Review.

    Read more →
  • Web data integration

    Web data integration

    Web data integration (WDI) is the process of aggregating and managing data from different websites into a single, homogeneous workflow. This process includes data access, transformation, mapping, quality assurance and fusion of data. Data that is sourced and structured from websites is referred to as "web data". WDI is an extension and specialization of data integration that views the web as a collection of heterogeneous databases. Data integration techniques in the context of the web, forms the foundation for businesses taking advantage of data available on the ever-increasing number of publicly-accessible websites. Corporate spending on this area amounted to about USD 2.5bn in 2017, and it is expected that by 2020 the market will reach almost USD 7bn.

    Read more →
  • AI: When a Robot Writes a Play

    AI: When a Robot Writes a Play

    AI: When a Robot Writes a Play (in Czech: AI: Když robot píše hru) is a 2021 experimental theatre play, where 90% of its script was automatically generated by artificial intelligence (the GPT-2 language model). The play is in Czech language, but an English version of the script also exists. == Creation == The play is the first result of the THEaiTRE research project, aiming to commemorate the centenary of the R.U.R. play by Karel Čapek by investigating to what extent artificial intelligence could be used to create theatre play scripts. The script of the play was created using the THEaiTRobot tool, based on the GPT-2 language model. First, the play dramaturge, David Košťák, described the initial setting of each scene in a few sentences, and wrote the first line for each character. Next, THEaiTRobot suggested a continuation of the script, which the dramaturge could use, reject, or use part of it and let the tool generate a new continuation. Another option was to manually insert another line or a scenic remark. The script was generated in English and was automatically translated to Czech by the state-of-the-art CUBBITT machine translation tool. The resulting script was then further post-edited by the dramaturge. The resulting script was made freely available for non-commercial use both in English and in Czech, with marked manually inserted texts and manual edits. The analysis shows that 90% of the English script is automatically generated, with 10% manually written or manually post-edited. In the Czech script, a larger amount of edits were made, but the analysis claims that these additional edits are corrections of errors of the automated translation and stylistic corrections which do not change the meaning of the lines as represented by the English script, but rather bring the Czech script closer to the English one. == Characters == The play contains 9 characters. The Robot appears in all the scenes, while each of the other characters appears in only one scene. Robot – The lead character, a male humanoid robot. Master – An old man, the creator of the Robot. Boy – A schoolboy. Masseuse – A sex worker in a brothel. Stranger – An engineer. Man. Psychologist. Administrator – A female clerk at an employment agency. Actress – A film actress and a model in a robot-like costume. == Plot == The play is composed of 8 scenes. It tells the story of a humanoid robot, who encounters 8 other characters and engages into various typically human situations and activities, related to death, love, sex, violence, etc. The individual scenes are not tightly linked, but there are some linking points, such as the central character of the robot or some repeated and developing themes, such as the robot's search for love. The scenes often contain some absurd turns and it is often hard to find sense in them. It is therefore a very complicated piece interpretationally, requiring the director and the actors to invest a lot of effort and creativity in finding a meaningful interpretation which would not deviate from the script. In the interpretation by Švanda theatre, who premiered the play and who also participated on the creation of the script, the scenes typically contain non-verbally expressed content which can add a lot to the meaning of the scene compared to what is contained in the actual script (as the script only contains the lines said by the characters). === Scene 1: Death === The play opens by the Robot parting with his dying Master. The Master gives the Robot several last lessons and talks with him about death, soul, and love. === Scene 2: Sense of Humour === In the second scene, the Robot meets a sad and angry Boy, who complains that he wants to go to school, that his girlfriend is crazy, that he wants to buy a car, etc. The Robot tries to help the Boy by giving him advice, but the Boy's reactions are quite negative and irritated. The Boy then repeatedly asks the Robot to tell him a joke; the Robot keeps refusing, but ultimately tells the following joke: When you are dead. When your children are dead. When your grandchildren are dead, I will be still alive. === Scene 3: Nightclub === The Robot wants to feel pleasure, so he goes to a "night club" (a brothel), where he meets a "Masseuse" (a prostitute). The Robot is initially "a bit cold", but eventually manages to enjoy the experience and falls in love with the Masseuse. In the Švanda theatre performance, the Robot and the Masseuse seem to have a sort of virtual sex without touching each other, reminiscent of the sex scene in Demolition Man. === Scene 4: Fear of the Dark === It is the night. The Robot is standing under a lamp, unable to move away from the light as he finds that he is afraid of the dark. He meets a Stranger, an engineer who tells him that robots don't have feelings and that people cannot be trusted, and keeps hurting him. In the Švanda theatre performance, the Man repeatedly zaps the Robot with some kind of electric pulse. === Scene 5: Killer Robot === A Man approaches the Robot and repeatedly asks him to kill him. Instead, the Robot sticks a finger into the Man's anus, which leads to an argument between the Man and the Robot. === Scene 6: Burn Out === The Robot meets a Psychologist, who keeps asking him lots of questions regarding his life, burnout feeling, love, relationships, and emotions. They also talk about the Robot using a device called emotion machine which helps him to get rid of stress. === Scene 7: Search for Job === The Robot comes to an employment agency. He meets an Administrator and asks her to help him find a job. He expresses the wish to become an actor, and talks about his experience as a clown. He reveals his name to be Troy McClure, which is a character from The Simpsons who is an actor. In the Švanda theatre performance, the Administrator starts to seduce the Robot once his name is revealed, which he keeps ignoring; the Administrator then becomes irritated. === Scene 8: Love at First Sight === The Robot meets a human Actress in a robotic costume and falls in love with her immediately. The Actress is first reluctant, but the Robot manages to seduce her and she also falls in love with him. The Robot tells her about a binary world, in which he lives and where he will also take her. Ultimately, the Actress agrees, and the whole play concludes by the Robot and the Actress promising each to other to always be together. In the Švanda theatre performance, the Robot does not have a physical body in this scene, we can only hear his voice and see a pulsating light (based on the line in the script where the Robot says: "I have no body. So I don't need to wear clothes. You can't see me, you only hear me."), and the Actress eventually also agrees to lose her physical body so that she can be with the Robot forever. == Theatrical performances == The play premiered on 26 February 2021 in Švanda Theatre in Prague, Czech Republic, directed by Daniel Hrbek. Due to the COVID-19 pandemic, the play was not played in front of a live audience, but it was broadcast online, in Czech language with English subtitles. The play was followed by a panel discussion by the project members and experts on artificial intelligence. The premiere was viewed by 13,498 spectators worldwide. A short trailer of the premiere is available on YouTube. In 2021, after the opening of the theatres in the Czech Republic to spectators, the play can be viewed at Švanda Theatre. The performance takes approximately 60 minutes, and is followed by a discussion of the creators with the audience. The derniere is planned for 4 February 2023. == Reception == The play received a number of reviews, both in its country of origin as well as internationally. It is praised as first of its kind, although some reviewers note the similarity to previous works, such as the musical Beyond the Fence, the play Lifestyle of the Richard and Family, or the short movie Sunspring; however, these works used less advanced technology, and either were very short (Sunspring) or necessitated a larger amount of human interventions. The reviewers note that the script is far from perfect, with many inconsistencies and nonsensical parts, and conclude that the technology is definitely not yet ready to replace human authors; however, some find some parts of the script frighteningly human-like. The amount of human intervention is a somewhat controversial topic, with some reviewers finding the human influence too large (especially in interpreting the script and putting the play on scene), while others feel that a greater amount of human intervention would have been favorable as this could greatly improve the quality of the play. The reviews also frequently comment on the amount of sex, violence and strong language in the play; this can be attributed to the method used for creating the script, where the GPT-2 language model reflects topics and language common in the human-written articles on the internet that were used to train the model. Furthermore, some r

    Read more →
  • Kuki AI

    Kuki AI

    Kuki is an embodied AI bot designed for usage in the metaverse. Formerly known as Mitsuku, Kuki is a chatbot created from the Pandorabots framework. The bot has won the Loebner Prize 5 times. == Features == Kuki claims to be an 18-year-old female chatbot from the Metaverse, and the developers have stated she has been worked on since 2005. Early work by one of the company's co-founders inspired the Spike Jonze movie Her. As of 2015, she conversed, on average, in excess of a quarter of a million times daily, and it was estimated 5 million unique users had interacted with her between 2016 and 2020. == Virtual talent, model, and influencer == Kuki has appeared as a Virtual Model in Vogue Business and at Crypto Fashion Week where she modelled NFTs and spoke about the future of digital fashion. In 2021, Kuki modelled five digital looks from emerging Vogue Talents designers for Italian Vogue, that sold out as NFTs in under an hour. Kuki has also modeled for H&M on Instagram in a digital campaign that resulted in an "11x increase in ad recall" per a case study by Meta. == Awards == As of 2019, Kuki had been awarded the Loebner Prize five times, more than any other entrant. In 2020, Kuki competed against Facebook AI's Blenderbot in a 24/7 verbal sparring match called "Bot Battle", winning 79% of the audience vote.

    Read more →
  • Tuple

    Tuple

    In mathematics, a tuple is a finite sequence (or ordered list) of numbers. More generally, it is a sequence of mathematical objects, called the elements of the tuple. An n-tuple is a tuple of n elements, where n is a non-negative integer. There is only one 0-tuple, called the empty tuple. A 1-tuple and a 2-tuple are commonly called a singleton and an ordered pair, respectively. The term "infinite tuple" is occasionally used for "infinite sequences". Tuples are usually written by listing the elements within parentheses "( )" and separated by commas; for example, (2, 7, 4, 1, 7) denotes a 5-tuple. Other types of brackets are sometimes used, although they may have a different meaning. An n-tuple can be formally defined as the image of a function that has the set of the first n natural numbers as its domain (1, 2, ..., n). Tuples may be also defined from ordered pairs by a recurrence starting from an ordered pair; indeed, an n-tuple can be identified with the ordered pair of its (n − 1) first elements and its nth element, for example, ( ( ( 1 , 2 ) , 3 ) , 4 ) = ( 1 , 2 , 3 , 4 ) {\displaystyle \left(\left(\left(1,2\right),3\right),4\right)=\left(1,2,3,4\right)} . In computer science, tuples come in many forms. Most typed functional programming languages implement tuples directly as product types, tightly associated with algebraic data types, pattern matching, and destructuring assignment. Many programming languages offer an alternative to tuples, known as record types, featuring unordered elements accessed by label. A few programming languages combine ordered tuple product types and unordered record types into a single construct, as in C structs and Haskell records. Relational databases may formally identify their rows (records) as tuples. Tuples also occur in relational algebra; when programming the semantic web with the Resource Description Framework (RDF); in linguistics; and in philosophy. == Etymology == The term originated as an abstraction of the sequence: single, couple/double, triple, quadruple, quintuple, sextuple, septuple, octuple, ..., n‑tuple, ..., where the prefixes are taken from the Latin names of the numerals. The unique 0-tuple is called the null tuple or empty tuple. A 1‑tuple is called a single (or singleton), a 2‑tuple is called an ordered pair or couple, and a 3‑tuple is called a triple (or triplet). The number n can be any nonnegative integer. For example, a complex number can be represented as a 2‑tuple of reals, a quaternion can be represented as a 4‑tuple, an octonion can be represented as an 8‑tuple, and a sedenion can be represented as a 16‑tuple. Although these uses treat ‑tuple as the suffix, the original suffix was ‑ple as in "triple" (three-fold) or "decuple" (ten‑fold). This originates from medieval Latin plus (meaning "more") related to Greek ‑πλοῦς, which replaced the classical and late antique ‑plex (meaning "folded"), as in "duplex". == Properties == The general rule for the identity of two n-tuples is ( a 1 , a 2 , … , a n ) = ( b 1 , b 2 , … , b n ) {\displaystyle (a_{1},a_{2},\ldots ,a_{n})=(b_{1},b_{2},\ldots ,b_{n})} if and only if a 1 = b 1 , a 2 = b 2 , … , a n = b n {\displaystyle a_{1}=b_{1},{\text{ }}a_{2}=b_{2},{\text{ }}\ldots ,{\text{ }}a_{n}=b_{n}} . Thus a tuple has properties that distinguish it from a set: A tuple may contain multiple instances of the same element, so tuple ( 1 , 2 , 2 , 3 ) ≠ ( 1 , 2 , 3 ) {\displaystyle (1,2,2,3)\neq (1,2,3)} ; but set { 1 , 2 , 2 , 3 } = { 1 , 2 , 3 } {\displaystyle \{1,2,2,3\}=\{1,2,3\}} . Tuple elements are ordered: tuple ( 1 , 2 , 3 ) ≠ ( 3 , 2 , 1 ) {\displaystyle (1,2,3)\neq (3,2,1)} , but set { 1 , 2 , 3 } = { 3 , 2 , 1 } {\displaystyle \{1,2,3\}=\{3,2,1\}} . A tuple has a finite number of elements, while a set or a multiset may have an infinite number of elements. == Definitions == There are several definitions of tuples that give them the properties described in the previous section. === Tuples as functions === The 0 {\displaystyle 0} -tuple may be identified as the empty function. For n ≥ 1 , {\displaystyle n\geq 1,} the n {\displaystyle n} -tuple ( a 1 , … , a n ) {\displaystyle \left(a_{1},\ldots ,a_{n}\right)} may be identified with the surjective function F : { 1 , … , n } → { a 1 , … , a n } {\displaystyle F~:~\left\{1,\ldots ,n\right\}~\to ~\left\{a_{1},\ldots ,a_{n}\right\}} with domain domain ⁡ F = { 1 , … , n } = { i ∈ N : 1 ≤ i ≤ n } {\displaystyle \operatorname {domain} F=\left\{1,\ldots ,n\right\}=\left\{i\in \mathbb {N} :1\leq i\leq n\right\}} and with codomain codomain ⁡ F = { a 1 , … , a n } , {\displaystyle \operatorname {codomain} F=\left\{a_{1},\ldots ,a_{n}\right\},} that is defined at i ∈ domain ⁡ F = { 1 , … , n } {\displaystyle i\in \operatorname {domain} F=\left\{1,\ldots ,n\right\}} by F ( i ) := a i . {\displaystyle F(i):=a_{i}.} That is, F {\displaystyle F} is the function defined by 1 ↦ a 1 ⋮ n ↦ a n {\displaystyle {\begin{alignedat}{3}1\;&\mapsto &&\;a_{1}\\\;&\;\;\vdots &&\;\\n\;&\mapsto &&\;a_{n}\\\end{alignedat}}} in which case the equality ( a 1 , a 2 , … , a n ) = ( F ( 1 ) , F ( 2 ) , … , F ( n ) ) {\displaystyle \left(a_{1},a_{2},\dots ,a_{n}\right)=\left(F(1),F(2),\dots ,F(n)\right)} necessarily holds. Tuples as sets of ordered pairs Functions are commonly identified with their graphs, which is a certain set of ordered pairs. Indeed, many authors use graphs as the definition of a function. Using this definition of "function", the above function F {\displaystyle F} can be defined as: F := { ( 1 , a 1 ) , … , ( n , a n ) } . {\displaystyle F~:=~\left\{\left(1,a_{1}\right),\ldots ,\left(n,a_{n}\right)\right\}.} === Tuples as nested ordered pairs === Another way of modeling tuples in set theory is as nested ordered pairs. This approach assumes that the notion of ordered pair has already been defined. The 0-tuple (i.e. the empty tuple) is represented by the empty set ∅ {\displaystyle \emptyset } . An n-tuple, with n > 0, can be defined as an ordered pair of its first entry and an (n − 1)-tuple (which contains the remaining entries when n > 1): ( a 1 , a 2 , a 3 , … , a n ) = ( a 1 , ( a 2 , a 3 , … , a n ) ) {\displaystyle (a_{1},a_{2},a_{3},\ldots ,a_{n})=(a_{1},(a_{2},a_{3},\ldots ,a_{n}))} This definition can be applied recursively to the (n − 1)-tuple: ( a 1 , a 2 , a 3 , … , a n ) = ( a 1 , ( a 2 , ( a 3 , ( … , ( a n , ∅ ) … ) ) ) ) {\displaystyle (a_{1},a_{2},a_{3},\ldots ,a_{n})=(a_{1},(a_{2},(a_{3},(\ldots ,(a_{n},\emptyset )\ldots ))))} Thus, for example: ( 1 , 2 , 3 ) = ( 1 , ( 2 , ( 3 , ∅ ) ) ) ( 1 , 2 , 3 , 4 ) = ( 1 , ( 2 , ( 3 , ( 4 , ∅ ) ) ) ) {\displaystyle {\begin{aligned}(1,2,3)&=(1,(2,(3,\emptyset )))\\(1,2,3,4)&=(1,(2,(3,(4,\emptyset ))))\\\end{aligned}}} A variant of this definition starts "peeling off" elements from the other end: The 0-tuple is the empty set ∅ {\displaystyle \emptyset } . For n > 0: ( a 1 , a 2 , a 3 , … , a n ) = ( ( a 1 , a 2 , a 3 , … , a n − 1 ) , a n ) {\displaystyle (a_{1},a_{2},a_{3},\ldots ,a_{n})=((a_{1},a_{2},a_{3},\ldots ,a_{n-1}),a_{n})} This definition can be applied recursively: ( a 1 , a 2 , a 3 , … , a n ) = ( ( … ( ( ( ∅ , a 1 ) , a 2 ) , a 3 ) , … ) , a n ) {\displaystyle (a_{1},a_{2},a_{3},\ldots ,a_{n})=((\ldots (((\emptyset ,a_{1}),a_{2}),a_{3}),\ldots ),a_{n})} Thus, for example: ( 1 , 2 , 3 ) = ( ( ( ∅ , 1 ) , 2 ) , 3 ) ( 1 , 2 , 3 , 4 ) = ( ( ( ( ∅ , 1 ) , 2 ) , 3 ) , 4 ) {\displaystyle {\begin{aligned}(1,2,3)&=(((\emptyset ,1),2),3)\\(1,2,3,4)&=((((\emptyset ,1),2),3),4)\\\end{aligned}}} === Tuples as nested sets === Using Kuratowski's representation for an ordered pair, the second definition above can be reformulated in terms of pure set theory: The 0-tuple (i.e. the empty tuple) is represented by the empty set ∅ {\displaystyle \emptyset } ; Let x {\displaystyle x} be an n-tuple ( a 1 , a 2 , … , a n ) {\displaystyle (a_{1},a_{2},\ldots ,a_{n})} , and let x → b ≡ ( a 1 , a 2 , … , a n , b ) {\displaystyle x\rightarrow b\equiv (a_{1},a_{2},\ldots ,a_{n},b)} . Then, x → b ≡ { { x } , { x , b } } {\displaystyle x\rightarrow b\equiv \{\{x\},\{x,b\}\}} . (The right arrow, → {\displaystyle \rightarrow } , could be read as "adjoined with".) In this formulation: ( ) = ∅ ( 1 ) = ( ) → 1 = { { ( ) } , { ( ) , 1 } } = { { ∅ } , { ∅ , 1 } } ( 1 , 2 ) = ( 1 ) → 2 = { { ( 1 ) } , { ( 1 ) , 2 } } = { { { { ∅ } , { ∅ , 1 } } } , { { { ∅ } , { ∅ , 1 } } , 2 } } ( 1 , 2 , 3 ) = ( 1 , 2 ) → 3 = { { ( 1 , 2 ) } , { ( 1 , 2 ) , 3 } } = { { { { { { ∅ } , { ∅ , 1 } } } , { { { ∅ } , { ∅ , 1 } } , 2 } } } , { { { { { ∅ } , { ∅ , 1 } } } , { { { ∅ } , { ∅ , 1 } } , 2 } } , 3 } } {\displaystyle {\begin{array}{lclcl}()&&&=&\emptyset \\&&&&\\(1)&=&()\rightarrow 1&=&\{\{()\},\{(),1\}\}\\&&&=&\{\{\emptyset \},\{\emptyset ,1\}\}\\&&&&\\(1,2)&=&(1)\rightarrow 2&=&\{\{(1)\},\{(1),2\}\}\\&&&=&\{\{\{\{\emptyset \},\{\emptyset ,1\}\}\},\\&&&&\{\{\{\emptyset \},\{\emptyset ,1\}\},2\}\}\\&&&&\\(1,2,3)&=&(1,2)\rightarrow 3&=&\{\{(1,2)\},\{(1,2),3\}\}\\&&&=&\{\{\{\{\{\{\empty

    Read more →
  • Subject indexing

    Subject indexing

    Subject indexing is the act of describing or classifying a document by index terms, keywords, or other symbols in order to indicate what different documents are about, to summarize their contents or to increase findability. In other words, the objective is to identify and describe the subject of documents. Indexes are constructed, separately, on three distinct levels: terms in a document, such as a book; objects in a collection, such as a library; and documents (such as books and articles) within a field of knowledge. Subject indexing is used in information retrieval especially to create bibliographic indexes to retrieve documents on a particular subject. Examples of academic indexing services are Zentralblatt MATH, Chemical Abstracts, and PubMed. The index terms were mostly assigned by experts but author keywords are also common. The process of indexing begins with any analysis of the subject of the document. The indexer must then identify terms that appropriately identify the subject, either by extracting words directly from the document or assigning words from a controlled vocabulary. The terms in the index are then presented in a systematic order. Indexers must decide how many terms to include and how specific the terms should be. Together this gives a depth of indexing. == Subject analysis == The first step in indexing is to decide on the subject matter of the document. In manual indexing, the indexer would consider the subject matter in terms of answer to a set of questions such as "Does the document deal with a specific product, condition or phenomenon?". As the analysis is influenced by the knowledge and experience of the indexer, it follows that two indexers may analyze the content differently and so come up with different index terms. This will impact on the success of retrieval. === Automatic vs. manual subject analysis === Automatic indexing follows set processes of analyzing frequencies of word patterns and comparing results to other documents in order to assign to subject categories. This requires no understanding of the material being indexed. This leads to more uniform indexing but at the expense of the true meaning being interpreted. A computer program will not understand the meaning of statements and may therefore fail to assign some relevant terms or assign incorrectly. Human indexers focus their attention on certain parts of the document such as the title, abstract, summary and conclusions, as analyzing the full text in depth is costly and time-consuming. An automated system takes away the time limit and allows the entire document to be analyzed, but also has the option to be directed to particular parts of the document. == Term selection == The second stage of indexing involves the translation of the subject analysis into a set of index terms. This can involve extracting from the document or assigning from a controlled vocabulary. With the ability to conduct a full text search widely available, many people have come to rely on their own expertise in conducting information searches and full text search has become very popular. Subject indexing and its experts, professional indexers, catalogers, and librarians, remains crucial to information organization and retrieval. These experts understand controlled vocabularies and are able to find information that cannot be located by full text search. The cost of expert analysis to create subject indexing is not easily compared to the cost of hardware, software and labor to manufacture a comparable set of full-text, fully searchable materials. With new web applications that allow every user to annotate documents, social tagging has gained popularity especially in the Web. One application of indexing, the book index, remains relatively unchanged despite the information revolution. === Extraction/Derived indexing === Extraction indexing involves taking words directly from the document. It uses natural language and lends itself well to automated techniques where word frequencies are calculated and those with a frequency over a pre-determined threshold are used as index terms. A stop-list containing common words (such as "the", "and") would be referred to and such stop words would be excluded as index terms. Automated extraction indexing may lead to loss of meaning of terms by indexing single words as opposed to phrases. Although it is possible to extract commonly occurring phrases, it becomes more difficult if key concepts are inconsistently worded in phrases. Automated extraction indexing also has the problem that, even with use of a stop-list to remove common words, some frequent words may not be useful for allowing discrimination between documents. For example, the term glucose is likely to occur frequently in any document related to diabetes. Therefore, use of this term would likely return most or all the documents in the database. Post-coordinated indexing where terms are combined at the time of searching would reduce this effect but the onus would be on the searcher to link appropriate terms as opposed to the information professional. In addition terms that occur infrequently may be highly significant for example a new drug may be mentioned infrequently but the novelty of the subject makes any reference significant. One method for allowing rarer terms to be included and common words to be excluded by automated techniques would be a relative frequency approach where frequency of a word in a document is compared to frequency in the database as a whole. Therefore, a term that occurs more often in a document than might be expected based on the rest of the database could then be used as an index term, and terms that occur equally frequently throughout will be excluded. Another problem with automated extraction is that it does not recognize when a concept is discussed but is not identified in the text by an indexable keyword. Since this process is based on simple string matching and involves no intellectual analysis, the resulting product is more appropriately known as a concordance than an index. === Assignment indexing === An alternative is assignment indexing where index terms are taken from a controlled vocabulary. This has the advantage of controlling for synonyms as the preferred term is indexed and synonyms or related terms direct the user to the preferred term. This means the user can find articles regardless of the specific term used by the author and saves the user from having to know and check all possible synonyms. It also removes any confusion caused by homographs by inclusion of a qualifying term. A third advantage is that it allows the linking of related terms whether they are linked by hierarchy or association, e.g. an index entry for an oral medication may list other oral medications as related terms on the same level of the hierarchy but would also link to broader terms such as treatment. Assignment indexing is used in manual indexing to improve inter-indexer consistency as different indexers will have a controlled set of terms to choose from. Controlled vocabularies do not completely remove inconsistencies as two indexers may still interpret the subject differently. == Index presentation == The final phase of indexing is to present the entries in a systematic order. This may involve linking entries. In a pre-coordinated index the indexer determines the order in which terms are linked in an entry by considering how a user may formulate their search. In a post-coordinated index, the entries are presented singly and the user can link the entries through searches, most commonly carried out by computer software. Post-coordination results in a loss of precision in comparison to pre-coordination. == Depth of indexing == Indexers must make decisions about what entries should be included and how many entries an index should incorporate. The depth of indexing describes the thoroughness of the indexing process with reference to exhaustivity and specificity. === Exhaustivity === An exhaustive index is one which lists all possible index terms. Greater exhaustivity gives a higher recall, or more likelihood of all the relevant articles being retrieved, however, this occurs at the expense of precision. This means that the user may retrieve a larger number of irrelevant documents or documents which only deal with the subject in little depth. In a manual system a greater level of exhaustivity brings with it a greater cost as more man-hours are required. The additional time taken in an automated system would be much less significant. At the other end of the scale, in a selective index only the most important aspects are covered. Recall is reduced in a selective index as if an indexer does not include enough terms, a highly relevant article may be overlooked. Therefore, indexers should strive for a balance and consider what the document may be used. They may also have to consider the implications of time and expense. === Specificity === The specificity describes how closel

    Read more →
  • Novell File Reporter

    Novell File Reporter

    Novell File Reporter (NFR) is software that allows network administrators to identify files stored on the network and generates reports regarding the size of individual files, file type, when files were last accessed, and where duplicates exist. Additionally, the File Reporter tracks storage volume capacity and usage. It is a component of the Novell File Management Suite. == How it works == Novell File Reporter examines and reports on terabytes of data via a central reporting engine (NFR Engine) and distributed agents (NFR Agents). The NFR Engine schedules the scans of file instances conducted by NFR Agents, processes and compiles the scans for reporting purposes, and provides report information to the user interface. In addition to the standard reports it can generate, the NFR Engine can also produce "trigger reports" in response to specific events (a server volume crossing a capacity threshold, for example). Accordingly, the NFR Engine monitors the data gathered by the NFR Agents in order to identify these "triggers." The NFR Engine when working in either eDirectory or Active Directory connects to the directory via a Directory Services Interface (DSI) and thus can monitor and check file permissions.

    Read more →
  • Connectionist expert system

    Connectionist expert system

    Connectionist expert systems are artificial neural network (ANN) based expert systems where the ANN generates inferencing rules e.g., fuzzy-multi layer perceptron where linguistic and natural form of inputs are used. Apart from that, rough set theory may be used for encoding knowledge in the weights better and also genetic algorithms may be used to optimize the search solutions better. Symbolic reasoning methods may also be incorporated (see hybrid intelligent system). (Also see expert system, neural network, clinical decision support system.)

    Read more →
  • Enterprise Objects Framework

    Enterprise Objects Framework

    The Enterprise Objects Framework, or simply EOF, was introduced by NeXT in 1994 as a pioneering object-relational mapping product for its NeXTSTEP and OpenStep development platforms. EOF abstracts the process of interacting with a relational database by mapping database rows to Java or Objective-C objects. This largely relieves developers from writing low-level SQL code. EOF enjoyed some niche success in the mid-1990s among financial institutions who were attracted to the rapid application development advantages of NeXT's object-oriented platform. Since Apple Inc's merger with NeXT in 1996, EOF has evolved into a fully integrated part of WebObjects, an application server also originally from NeXT. Many of the core concepts of EOF re-emerged as part of Core Data, which further abstracts the underlying data formats to allow it to be based on non-SQL stores. == History == In the early 1990s NeXT Computer recognized that connecting to databases was essential to most businesses and yet also potentially complex. Every data source has a different data-access language (or API), driving up the costs to learn and use each vendor's product. The NeXT engineers wanted to apply the advantages of object-oriented programming, by getting objects to "talk" to relational databases. As the two technologies are very different, the solution was to create an abstraction layer, insulating developers from writing the low-level procedural code (SQL) specific to each data source. The first attempt came in 1992 with the release of Database Kit (DBKit), which wrapped an object-oriented framework around any database. Unfortunately, NEXTSTEP at the time was not powerful enough and DBKit had serious design flaws. NeXT's second attempt came in 1994 with the Enterprise Objects Framework (EOF) version 1, a complete rewrite that was far more modular and OpenStep compatible. EOF 1.0 was the first product released by NeXT using the Foundation Kit and introduced autoreleased objects to the developer community. The development team at the time was only four people: Jack Greenfield, Rich Williamson, Linus Upson and Dan Willhite. EOF 2.0, released in late 1995, further refined the architecture, introducing the editing context. At that point, the development team consisted of Dan Willhite, Craig Federighi, Eric Noyau and Charly Kleissner. EOF achieved a modest level of popularity in the financial programming community in the mid-1990s, but it would come into its own with the emergence of the World Wide Web and the concept of web applications. It was clear that EOF could help companies plug their legacy databases into the Web without any rewriting of that data. With the addition of frameworks to do state management, load balancing and dynamic HTML generation, NeXT was able to launch the first object-oriented Web application server, WebObjects, in 1996, with EOF at its core. In 2000, Apple Inc. (which had merged with NeXT) officially dropped EOF as a standalone product, meaning that developers would be unable to use it to create desktop applications for the forthcoming Mac OS X. It would, however, continue to be an integral part of a major new release of WebObjects. WebObjects 5, released in 2001, was significant for the fact that its frameworks had been ported from their native Objective-C programming language to the Java language. Critics of this change argue that most of the power of EOF was a side effect of its Objective-C roots, and that EOF lost the beauty or simplicity it once had. Third-party tools, such as EOGenerator, help fill the deficiencies introduced by Java (mainly due to the loss of categories). The Objective-C code base was re-introduced with some modifications to desktop application developers as Core Data, part of Apple's Cocoa API, with the release of Mac OS X Tiger in April 2005. == How EOF works == Enterprise Objects provides tools and frameworks for object-relational mapping. The technology specializes in providing mechanisms to retrieve data from various data sources, such as relational databases via JDBC and JNDI directories, and mechanisms to commit data back to those data sources. These mechanisms are designed in a layered, abstract approach that allows developers to think about data retrieval and commitment at a higher level than a specific data source or data source vendor. Central to this mapping is a model file (an "EOModel") that you build with a visual tool — either EOModeler, or the EOModeler plug-in to Xcode. The mapping works as follows: Database tables are mapped to classes. Database columns are mapped to class attributes. Database rows are mapped to objects (or class instances). You can build data models based on existing data sources or you can build data models from scratch, which you then use to create data structures (tables, columns, joins) in a data source. The result is that database records can be transposed into Java objects. The advantage of using data models is that applications are isolated from the idiosyncrasies of the data sources they access. This separation of an application's business logic from database logic allows developers to change the database an application accesses without needing to change the application. EOF provides a level of database transparency not seen in other tools and allows the same model to be used to access different vendor databases and even allows relationships across different vendor databases without changing source code. Its power comes from exposing the underlying data sources as managed graphs of persistent objects. In simple terms, this means that it organizes the application's model layer into a set of defined in-memory data objects. It then tracks changes to these objects and can reverse those changes on demand, such as when a user performs an undo command. Then, when it is time to save changes to the application's data, it archives the objects to the underlying data sources. === Using Inheritance === In designing Enterprise Objects developers can leverage the object-oriented feature known as inheritance. A Customer object and an Employee object, for example, might both inherit certain characteristics from a more generic Person object, such as name, address, and phone number. While this kind of thinking is inherent in object-oriented design, relational databases have no explicit support for inheritance. However, using Enterprise Objects, you can build data models that reflect object hierarchies. That is, you can design database tables to support inheritance by also designing enterprise objects that map to multiple tables or particular views of a database table. == Enterprise Objects (EOs) == An Enterprise Object is analogous to what is often known in object-oriented programming as a business object — a class which models a physical or conceptual object in the business domain (e.g. a customer, an order, an item, etc.). What makes an EO different from other objects is that its instance data maps to a data store. Typically, an enterprise object contains key-value pairs that represent a row in a relational database. The key is basically the column name, and the value is what was in that row in the database. So it can be said that an EO's properties persist beyond the life of any particular running application. More precisely, an Enterprise Object is an instance of a class that implements the com.webobjects.eocontrol.EOEnterpriseObject interface. An Enterprise Object has a corresponding model (called an EOModel) that defines the mapping between the class's object model and the database schema. However, an enterprise object doesn't explicitly know about its model. This level of abstraction means that database vendors can be switched without it affecting the developer's code. This gives Enterprise Objects a high degree of reusability. == EOF and Core Data == Despite their common origins, the two technologies diverged, with each technology retaining a subset of the features of the original Objective-C code base, while adding some new features. === Features Supported Only by EOF === EOF supports custom SQL; shared editing contexts; nested editing contexts; and pre-fetching and batch faulting of relationships, all features of the original Objective-C implementation not supported by Core Data. Core Data also does not provide the equivalent of an EOModelGroup—the NSManagedObjectModel class provides methods for merging models from existing models, and for retrieving merged models from bundles. === Features Supported Only by Core Data === Core Data supports fetched properties; multiple configurations within a managed object model; local stores; and store aggregation (the data for a given entity may be spread across multiple stores); customization and localization of property names and validation warnings; and the use of predicates for property validation. These features of the original Objective-C implementation are not supported by the Java implementation.

    Read more →
  • Documentation science

    Documentation science

    Documentation science is the study of the recording and retrieval of information. It includes methods for storing, retrieving, and sharing of information captured on physical as well as digital documents. This field is closely linked to the fields of library science and information science but has its own theories and practices. The term documentation science was coined by Belgian lawyer and peace activist Paul Otlet. He is considered to be the forefather of information science. He along with Henri La Fontaine laid the foundations of documentation science as a field of study. Professionals in this field are called documentalists. Over the years, documentation science has grown to become a large and important field of study. Evolving from traditional practices like archiving and retrieval to modern theories about the nature of documents, novel methods for organizing digital information, and applications in libraries, research, healthcare, business, and technology and more. This field continues to evolve in the digital age. == Developments in documentation science == 1895: The International Institute of Bibliography (originally Institut International de Bibliographie, IIB) was established on 12 September 1895, in Brussels, Belgium by Paul Otlet and Henri La Fontaine. It aimed to catalog all recorded knowledge using a universal classification system now known as the Universal Decimal Classification (UDC). 1931: International Institute of Bibliography (originally Institut International de Bibliographie, IIB) was renamed The International Institute for Documentation, (Institut International de Documentation, IID). 1934: Paul Otlet envisioned a “radiated library,” a global network of interconnected documents accessible from anywhere via telecommunication. This early idea is now seen as a forerunner of the internet. 1937: American Documentation Institute was founded (1968 nameshift to American Society for Information Science). 1951: Suzanne Briet published Qu'est-ce que la documentation? where she proposed that “a document is evidence in support of a fact,” expanding the definition to include objects such as animals in zoos when they are part of a scientific study. This was a significant theoretical shift in defining documents. 1965-1990: Documentation departments were established, for example, large research libraries, online computer retrieval systems and more. The persons doing the searches were called documentalists. But with the appearance of first CD-ROM databases in the mid-1980s and later the internet in 1990s, these intermediary searches decreased and most such departments closed or merged with other departments. 1996: "Dokvit", Documentation Studies, was established in 1996 at the University of Tromsø in Norway. 2001: The Document Academy was established. It is an international network that celebrates documentation. It was conducted by The Program of Documentation Studies, University of Tromsø, Norway and The School of Information Management and Systems, UC Berkeley. 2003: The first Document Research Conference (DOCAM), a series of conferences made by the Document Academy. DOCAM '03 (2003) was held 13–15 August 2003 at The School of Information Management and Systems (SIMS) at the University of California, Berkeley. 2007: Michael Buckland, Ronald Day, and Birger Hjørland expanded the theoretical foundations of documentation science. They researched and explored documents to be social artifacts, the role of ideology in classification, and how documents influenced knowledge systems. 2010s: The concept of post-documentation or “documentality” began in the 2010s, which focused on how digital traces (e.g., tweets, logs) function as documents without traditional physical form. This led to new thinking in document theory. 2016–present: The Document Academy's DOCAM conferences have continued, offering ongoing developments in the theory and practice of documentation. Themes include affect, memory, activism, and born-digital records. 2017: The journal Information Research published special issues addressing “document theory,” including views on documentation in virtual environments and digital archives. 2020–present: The growth of research data management (RDM) and open science has made documentation practices central to data sharing, metadata standards, and reproducibility in scientific work. == Theoretical foundations == Documentation science has some deep theories that explain what a document is, how people use documents, and how they are organized. These concepts were introduced by scholars who have not only studied libraries, but also philosophy, language, and social sciences. Suzanne Briet described a document as “any material form of evidence” that is made to be used as proof or to share information. An antelope in a zoo, for example, can be a document because it is being studied, classified, and described. Documents are not just things or materials but are also shaped by society. Michael Buckland noted that documents have meaning only when people agree they are useful or valid as information. He explained a document becomes a document when someone decides to use it as evidence. Ronald Day wrote about how documentation is not neutral, it can be influenced by power, ideology, and politics. He claimed that classification systems, like how libraries organize books, are not just technical tools. They also show what kinds of knowledge are seen as more important than others. In recent years, new theories have been introduced, like “documentality” by Maurizio Ferraris. He proposed that a document does not have to be a paper or file, it can also be something digital like a tweet, a database entry, or a log file, as long as it leaves a trace that can be looked at later. This theory helps explain modern digital documents. == Methodologies and practice == Documentation science includes many methods that help people collect, organize, store, and find information. These practices are used in libraries, archives, research labs, companies, and now also in online systems. === Collecting and creating documents === In the past, documentation work included gathering books, articles, reports, and other printed materials. People created records of these materials manually, using catalog cards, indexes, or bibliographies. Paul Otlet’s work with the Universal Bibliographic Repertory is one example. He created millions of card entries to organize knowledge from around the world. Today, documents are not only created by humans. Computers and machines also generate documents, like log files, metadata, and sensor data. These need new tools and methods for collection and management. === Organizing information === Organizing documents has always been a foundational element of documentation science. Methods like classification (dividing things into groups) and indexing (making lists of topics or keywords) help individuals find what they need. A widely used system is the Universal Decimal Classification (UDC) developed by Otlet and La Fontaine. Another is the Library of Congress Classification (LCC) used in the majority of U.S. libraries. Indexing can be performed by humans or by software programs that read the text and add tags to documents. Metadata is also used to describe documents. Metadata is “data about data” like the title, author, date, and subject of a document. Standards like Dublin Core are used in digital libraries to keep metadata consistent. === Retrieval and access === One of the main objectives of documentation is helping users find the right document. This is called information retrieval. In the past, this meant using catalog drawers or printed indexes. Today, people use search engines, databases, and digital libraries. Modern retrieval tools use Boolean logic, ranking algorithms, and sometimes machine learning to show the most useful results first. This is part of what is studied in both documentation science and information retrieval. === Preservation and archiving === Documents require long-term storage. This is called preservation of documents. Printed documents can be damaged by light, pests, or even time on the other hand digital documents can be deemed worthless if formats become outdated or storage facilities fail. Archivists use methods like migration, which includes moving files to new formats, and emulation, which replicates obsolete systems, to preserve materials. These methods and tools are ever changing as new technologies develop. But the main objective of documentation has remained the same, which is to keep information safe, organized, and easy to find. == Documentation in the digital age == With the expansion of the internet, computers, and cloud storage, documents are no longer just books, papers, or reports. They can now be emails, tweets, videos, websites, databases, or even log files created by machines. === Born-digital documents === Many documents today are created directly in digital form. These are called born-digit

    Read more →
  • Algorithmic game theory

    Algorithmic game theory

    Algorithmic game theory (AGT) is an interdisciplinary field at the intersection of game theory and computer science, focused on understanding and designing algorithms for environments where multiple strategic agents interact. This research area combines computational thinking with economic principles to address challenges that emerge when algorithmic inputs come from self-interested participants. In traditional algorithm design, inputs are assumed to be fixed and reliable. However, in many real-world applications—such as online auctions, internet routing, digital advertising, and resource allocation systems—inputs are provided by multiple independent agents who may strategically misreport information to manipulate outcomes in their favor. AGT provides frameworks to analyze and design systems that remain effective despite such strategic behavior. The field can be approached from two complementary perspectives: Analysis: Evaluating existing algorithms and systems through game-theoretic tools to understand their strategic properties. This includes calculating and proving properties of Nash equilibria (stable states where no participant can benefit by changing only their own strategy), measuring price of anarchy (efficiency loss due to selfish behavior), and analyzing best-response dynamics (how systems evolve when players sequentially optimize their strategies). Design: Creating mechanisms and algorithms with both desirable computational properties and game-theoretic robustness. This sub-field, known as algorithmic mechanism design, develops systems that incentivize truthful behavior while maintaining computational efficiency. Algorithm designers in this domain must satisfy traditional algorithmic requirements (such as polynomial-time running time and good approximation ratio) while simultaneously addressing incentive constraints that ensure participants act according to the system's intended design. == History == === Nisan-Ronen: a new framework for studying algorithms === In 1999, the seminal paper of Noam Nisan and Amir Ronen drew the attention of the Theoretical Computer Science community to designing algorithms for selfish (strategic) users. As they claim in the abstract: We consider algorithmic problems in a distributed setting where the participants cannot be assumed to follow the algorithm but rather their own self-interest. As such participants, termed agents, are capable of manipulating the algorithm, the algorithm designer should ensure in advance that the agents’ interests are best served by behaving correctly. Following notions from the field of mechanism design, we suggest a framework for studying such algorithms. In this model the algorithmic solution is adorned with payments to the participants and is termed a mechanism. The payments should be carefully chosen as to motivate all participants to act as the algorithm designer wishes. We apply the standard tools of mechanism design to algorithmic problems and in particular to the shortest path problem. This paper coined the term algorithmic mechanism design and was recognized by the 2012 Gödel Prize committee as one of "three papers laying foundation of growth in Algorithmic Game Theory". === Price of Anarchy === The other two papers cited in the 2012 Gödel Prize for fundamental contributions to Algorithmic Game Theory introduced and developed the concept of "Price of Anarchy". In their 1999 paper "Worst-case Equilibria", Koutsoupias and Papadimitriou proposed a new measure of the degradation of system efficiency due to the selfish behavior of its agents: the ratio of between system efficiency at an optimal configuration, and its efficiency at the worst Nash equilibrium. (The term "Price of Anarchy" only appeared a couple of years later.) === The Internet as a catalyst === The Internet created a new economy—both as a foundation for exchange and commerce, and in its own right. The computational nature of the Internet allowed for the use of computational tools in this new emerging economy. On the other hand, the Internet itself is the outcome of actions of many. This was new to the classic, ‘top-down’ approach to computation that held till then. Thus, game theory is a natural way to view the Internet and interactions within it, both human and mechanical. Game theory studies equilibria (such as the Nash equilibrium). An equilibrium is generally defined as a state in which no player has an incentive to change their strategy. Equilibria are found in several fields related to the Internet, for instance financial interactions and communication load-balancing. Game theory provides tools to analyze equilibria, and a common approach is then to ‘find the game’—that is, to formalize specific Internet interactions as a game, and to derive the associated equilibria. Rephrasing problems in terms of games allows the analysis of Internet-based interactions and the construction of mechanisms to meet specified demands. If equilibria can be shown to exist, a further question must be answered: can an equilibrium be found, and in reasonable time? This leads to the analysis of algorithms for finding equilibria. Of special importance is the complexity class PPAD, which includes many problems in algorithmic game theory. == Areas of research == === Algorithmic mechanism design === Mechanism design is the subarea of economics that deals with optimization under incentive constraints. Algorithmic mechanism design considers the optimization of economic systems under computational efficiency requirements. Typical objectives studied include revenue maximization and social welfare maximization. === Inefficiency of equilibria === The concepts of price of anarchy and price of stability were introduced to capture the loss in performance of a system due to the selfish behavior of its participants. The price of anarchy captures the worst-case performance of the system at equilibrium relative to the optimal performance possible. The price of stability, on the other hand, captures the relative performance of the best equilibrium of the system. These concepts are counterparts to the notion of approximation ratio in algorithm design. === Complexity of finding equilibria === The existence of an equilibrium in a game is typically established using non-constructive fixed point theorems. There are no efficient algorithms known for computing Nash equilibria. The problem is complete for the complexity class PPAD even in 2-player games. In contrast, correlated equilibria can be computed efficiently using linear programming, as well as learned via no-regret strategies. === Computational social choice === Computational social choice studies computational aspects of social choice, the aggregation of individual agents' preferences. Examples include algorithms and computational complexity of voting rules and coalition formation. Other topics include: Algorithms for computing Market equilibria Fair division Multi-agent systems And the area counts with diverse practical applications: Sponsored search auctions Spectrum auctions Cryptocurrencies Prediction markets Reputation systems Sharing economy Matching markets such as kidney exchange and school choice Crowdsourcing and peer grading Economics of the cloud == Journals and newsletters == ACM Transactions on Economics and Computation (TEAC) SIGEcom Exchanges Algorithmic Game Theory papers are often also published in Game Theory journals such as GEB, Economics journals such as Econometrica, and Computer Science journals such as SICOMP.

    Read more →
  • Game Jolt

    Game Jolt

    Game Jolt is a social community platform for video games, gamers and content creators. Founded by Yaprak and David DeCarmine, it is available on iOS, Android, and on the web and as a desktop app for Windows and Linux. Users share interactive content through a variety of formats including images, videos, live streams, chat rooms, and virtual events. == Features == === Crowd streaming === In 2021 Game Jolt revealed their own live streaming feature called Firesides. Firesides allowed multiple users to simultaneously livestream together with nearly no delay. The feature launched with a virtual concert showcasing its ability to accommodate multiple streamers. On October 16, 2023, Firesides were removed from Game Jolt. === Mobile app === Game Jolt Social by Game Jolt Inc. launched on both the Apple App Store and Google Play Store in March 2022. "It's clear to us that Gen Z is tired of generic social media and they want a place specifically for gaming that supports all types of content they're creating–art, videos, thoughts, and livestreams all in one place." said Game Jolt founder and CEO Yaprak DeCarmine, in a statement to VentureBeat. === Game API === The Game Jolt Application Programming Interface (usually known as the Game Jolt Game API) allows any developer using a game development platform that supports HTTP operations and MD5 or SHA-1. Game Jolt advertises that the API can: Create multiple "scoreboards" which collect high scores from players made publicly available on the game's profile and give user accounts EXP Award player's trophies which give user accounts EXP Store game data on Game Jolt's data servers Log whether a user is currently playing a game they're logged into via the GJAPI == Game jams and competitions == Game Jolt regularly hosts game jams where participants are encouraged to develop games for a chance to win prizes. They hosted their first game jam in 2009, Shocking Contest. In November 2014, Game Jolt announced the "Indies vs PewDiePie" game jam, partnering with the popular YouTuber Felix "PewDiePie" Kjellberg. Developers were given a weekend (21–24 November) to create a game with the theme of "fun to play, fun to watch" to suit the Let's Plays entertainment style. Users could rate entries afterwards until December 1 when the scores were counted up. The prize to the top 10 rated games was Felix playing the games on his channel as a means of promotion for the developers, although later he played other entries. One of the participants of the jam, now known as Outerminds Inc. was discovered and hired by PewDiePie to develop his mobile game, Legend of the Brofist. Game Jolt partnered with Felix, Sean "Jacksepticeye" McLoughlin and Mark "Markiplier" Fischbach to host "Indies vs Gamers" in July 2015. The requirements for entries were arcade games using the Game Jolt Game API highscore tables, to be made between the July 17–20 and the top 5 games were played on the partner's YouTube channels. Following the "Indies vs PewDiePie" game jam in 2014, Game Jolt released their internal jam hosting tools public for all users to use as a service, to create their own game jams that integrated with the main site. Today, Game Jolt focuses on hosting and co-hosting game competitions with established brands in order to bring monetary and educational opportunities to their users. On April 15, 2024, an announcement was made about a collaboration with Pocket Worlds for the "HighRise Game Jam". Pocket Worlds had sold NFTs up until roughly 2022, causing a community outburst. The situation was addressed, and the situation started to disperse. == Contests == == Events == Game Jolt hosts both physical and virtual events to entertain and prank its users, which consists of the following: == History == Game Jolt has supported independent creators with a central platform to manage their content and communities since its start in 2003. David DeCarmine began development of Game Jolt at the age of 14 for a group of hobbyists, making games and sharing on forums in an early iteration known as Holo World. The original intention was to create a platform for gamers where new games could be discoverable and quickly playable, and where feedback could be provided directly to the creators, allowing them to continue improving their games. In 2008, Game Jolt was registered as an LLC, then incorporated as Game Jolt Inc. in September 2020. A new site launched in 2015 featuring a responsive design, automated curation for both games and game news articles which weighs how recent a game was uploaded and how popular it is ("hot") and filtering options on game listings for platform, maturity rating and development status. In March 2022, Game Jolt launched a mobile application simultaneously on the Google Play Store and Apple App Store targeted at Gen Z gamers and creators. While in beta, the mobile app had 100,000 installs pre-launch. === Game store === Game Jolt continues to host a large library of independent games. Game developers can upload their games directly to the site to share or sell. They would allow distribution for downloadable games, later adding support for Adobe Flash, Unity and Java games which allowed support for browser based games. In February 2013, Game Jolt built support for browser-based HTML5 games as well. A user levelling system was released into public beta in April 2013, incorporating the GJAPI trophies and highscores, as well as site activity, to generate 'EXP' (experience points). Game Jolt Jams released in early 2014 as a service to allow users to create their own game jams that integrated with the main site. In April 2016, an online marketplace was announced and released the following month with an exclusive set of game titles, including Bendy and the Ink Machine, allowing developers to sell their games on the site. In January 2016, Game Jolt released source code of the client and site's front end on GitHub under MIT license. In January 2022, Game Jolt banned adult games from appearing on the site, stating in an email to developers that the site had become a "social media platform" and they "had to make decisions around the direction and future of the brand which has now included the removal of hosted games with explicitly adult content." In response to a tweet by Itch.io saying the site is not for prudes, they wrote in their own tweet: "Game Jolt is a platform with a large audience of 13-16 year olds. Our users asked us to clean up, so here we are." == Investments == After bootstrapping Game Jolt with revenue earned from ads on the website for years, the DeCarmines secured venture capital in 2020 from SoftBank, doing so again in 2021 from founders of Twitch, Rec Room, Modio and more.

    Read more →
  • Knuth–Eve algorithm

    Knuth–Eve algorithm

    In computer science, the Knuth–Eve algorithm is an algorithm for polynomial evaluation. It preprocesses the coefficients of the polynomial to reduce the number of multiplications required at runtime. Ideas used in the algorithm were originally proposed by Donald Knuth in 1962. His procedure opportunistically exploits structure in the polynomial being evaluated. In 1964, James Eve determined for which polynomials this structure exists, and gave a simple method of "preconditioning" polynomials (explained below) to endow them with that structure. == Algorithm == === Preliminaries === Consider an arbitrary polynomial p ∈ R [ x ] {\displaystyle p\in \mathbb {R} [x]} of degree n {\displaystyle n} . Assume that n ≥ 3 {\displaystyle n\geq 3} . Define m {\displaystyle m} such that: if n {\displaystyle n} is odd then n = 2 m + 1 {\displaystyle n=2m+1} , and if n {\displaystyle n} is even then n = 2 m + 2 {\displaystyle n=2m+2} . Unless otherwise stated, all variables in this article represent either real numbers or univariate polynomials with real coefficients. All operations in this article are done over R {\displaystyle \mathbb {R} } . Again, the goal is to create an algorithm that returns p ( x ) {\displaystyle p(x)} given any x {\displaystyle x} . The algorithm is allowed to depend on the polynomial p {\displaystyle p} itself, since its coefficients are known in advance. === Overview === ==== Key idea ==== Using polynomial long division, we can write p ( x ) = q ( x ) ⋅ ( x 2 − α ) + ( β x + γ ) , {\displaystyle p(x)=q(x)\cdot (x^{2}-\alpha )+(\beta x+\gamma ),} where x 2 − α {\displaystyle x^{2}-\alpha } is the divisor. Picking a value for α {\displaystyle \alpha } fixes both the quotient q {\displaystyle q} and the coefficients in the remainder β {\displaystyle \beta } and γ {\displaystyle \gamma } . The key idea is to cleverly choose α {\displaystyle \alpha } such that β = 0 {\displaystyle \beta =0} , so that p ( x ) = q ( x ) ⋅ ( x 2 − α ) + γ . {\displaystyle p(x)=q(x)\cdot (x^{2}-\alpha )+\gamma .} This way, no operations are needed to compute the remainder polynomial, since it's just a constant. We apply this procedure recursively to q {\displaystyle q} , expressing p ( x ) = ( ( q ( x ) ⋅ ( x 2 − α m ) + γ m ) ⋯ ) ⋅ ( x 2 − α 1 ) + γ 1 . {\displaystyle p(x)=\left(\left(q(x)\cdot (x^{2}-\alpha _{m})+\gamma _{m}\right)\cdots \right)\cdot (x^{2}-\alpha _{1})+\gamma _{1}.} After m {\displaystyle m} recursive calls, the quotient q {\displaystyle q} is either a linear or a quadratic polynomial. In this base case, the polynomial can be evaluated with (say) Horner's method. ==== "Preconditioning" ==== For arbitrary p {\displaystyle p} , it may not be possible to force β = 0 {\displaystyle \beta =0} at every step of the recursion. Consider the polynomials p e {\displaystyle p^{e}} and p o {\displaystyle p^{o}} with coefficients taken from the even and odd terms of p {\displaystyle p} respectively, so that p ( x ) = p e ( x 2 ) + x ⋅ p o ( x 2 ) . {\displaystyle p(x)=p^{e}(x^{2})+x\cdot p^{o}(x^{2}).} If every root of p o {\displaystyle p^{o}} is real, then it is possible to write p {\displaystyle p} in the form given above. Each α i {\displaystyle \alpha _{i}} is a different root of p o {\displaystyle p^{o}} , counting multiple roots as distinct. Furthermore, if at least n − 1 {\displaystyle n-1} roots of p {\displaystyle p} lie in one half of the complex plane, then every root of p o {\displaystyle p^{o}} is real. Ultimately, it may be necessary to "precondition" p {\displaystyle p} by shifting it — by setting p ( x ) ← p ( x + t ) {\displaystyle p(x)\gets p(x+t)} for some t {\displaystyle t} — to endow it with the structure that most of its roots lie in one half of the complex plane. At runtime, this shift has to be "undone" by first setting x ← x − t {\displaystyle x\gets x-t} . === Preprocessing step === The following algorithm is run once for a given polynomial p {\displaystyle p} . At this point, the values of x {\displaystyle x} that p {\displaystyle p} will be evaluated on are not known. ==== Better choice of t ==== While any t ≥ Re ( r 2 ) {\displaystyle t\geq {\text{Re}}(r_{2})} can work, it is possible to remove one addition during evaluation if t {\displaystyle t} is also chosen such that two roots of p ( x + t ) {\displaystyle p(x+t)} are symmetric about the origin. In that case, α 1 {\displaystyle \alpha _{1}} can be chosen such that the shifted polynomial has a factor of x 2 − α 1 {\displaystyle x^{2}-\alpha _{1}} , so γ 1 = 0 {\displaystyle \gamma _{1}=0} . It is always possible to find such a t {\displaystyle t} . One possible algorithm for choosing t {\displaystyle t} is: === Evaluation step === The following algorithm evaluates p {\displaystyle p} at some, now known, point x {\displaystyle x} . Assuming t {\displaystyle t} is chosen optimally, γ 1 = 0 {\displaystyle \gamma _{1}=0} . So, the final iteration of the loop can instead run y ← y ⋅ ( s − α i ) , {\displaystyle y\gets y\cdot (s-\alpha _{i}),} saving an addition. == Analysis == In total, evaluation using the Knuth–Eve algorithm for a polynomial of degree n {\displaystyle n} requires n {\displaystyle n} additions and ⌊ n / 2 ⌋ + 2 {\displaystyle \lfloor n/2\rfloor +2} multiplications, assuming t {\displaystyle t} is chosen optimally. No algorithm to evaluate a given polynomial of degree n {\displaystyle n} can use fewer than n {\displaystyle n} additions or fewer than ⌈ n / 2 ⌉ {\displaystyle \lceil n/2\rceil } multiplications during evaluation. This result assumes only addition and multiplication are allowed during both preprocessing and evaluation. The Knuth–Eve algorithm is not well-conditioned.

    Read more →
  • Enterprise information system

    Enterprise information system

    An Enterprise Information System (EIS) is any kind of information system which improves the functions of enterprise business processes through integration. This means typically offering high quality service, dealing with large volumes of data and capable of supporting some large and possibly complex organization or enterprise. An EIS must be able to be used by all parts and all levels of an enterprise. The word enterprise can have various connotations. Frequently the term is used only to refer to very large organizations such as multi-national companies or public-sector organizations. However, the term may be used to mean virtually anything, by virtue of it having become a corporate-speak buzzword. == Purpose == Enterprise information systems provide a technology platform that enables organizations to integrate and coordinate their business processes on a robust foundation. An EIS is currently used in conjunction with customer relationship management and supply chain management to automate business processes. An enterprise information system provides a single system that is central to the organization that ensuring information can be shared across all functional levels and management hierarchies. An EIS can be used to increase business productivity and reduce service cycles, product development cycles and marketing life cycles. It may be used to amalgamate existing applications. Other outcomes include higher operational efficiency and cost savings. Financial value is not usually a direct outcome from the implementation of an enterprise information system. == Design stage == At the design stage the main characteristic of EIS efficiency evaluation is the probability of timely delivery of various messages such as command, service, etc. == Information systems == Enterprise systems create a standard data structure and are invaluable in eliminating the problem of information fragmentation caused by multiple information systems within an organization. An EIS differentiates itself from legacy systems in that it is self-transactional, self-helping and adaptable to general and specialist conditions. Unlike an enterprise information system, legacy systems are limited to department-wide communications. A typical enterprise information system would be housed in one or more data centers, would run enterprise software, and could include applications that typically cross organizational borders such as content management systems.

    Read more →