AI Avatar For Zoom Meetings

AI Avatar For Zoom Meetings — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Framebuffer

    Framebuffer

    A framebuffer (frame buffer, or sometimes framestore) is a portion of random-access memory (RAM) containing a bitmap that drives a video display. It is a memory buffer containing data representing all the pixels in a complete video frame. Modern video cards contain framebuffer circuitry in their cores. This circuitry converts an in-memory bitmap into a video signal that can be displayed on a computer monitor. In computing, a screen buffer is a part of computer memory used by a computer application for the representation of the content to be shown on the computer display. The screen buffer may also be called the video buffer, the regeneration buffer, or regen buffer for short. The phrase "screen buffer” refers to a logical function, while video memory refers to a hardware storage location. In particular, the screen buffer may be placed in the main RAM, the video memory, or some other hardware location. To reduce latency and avoid screen tearing, multiple frames can be buffered, and this technique is called multiple buffering. When this is so, at any time, only one frame would be visible, and the others would not be. The currently invisible frames are located in the off-screen buffer. The information in the buffer typically consists of color values for every pixel to be shown on the display. Color values are commonly stored in 1-bit binary (monochrome), 4-bit palettized, 8-bit palettized, 16-bit high color and 24-bit true color formats. An additional alpha channel is sometimes used to retain information about pixel transparency. The total amount of memory required for the framebuffer depends on the resolution of the output signal, and on the color depth or palette size. == History == Computer researchers had long discussed the theoretical advantages of a framebuffer but were unable to produce a machine with sufficient memory at an economically practicable cost. In 1947, the Manchester Baby computer used a Williams tube, later the Williams-Kilburn tube, to store 1024 bits on a cathode-ray tube (CRT) memory and displayed on a second CRT. Other research labs were exploring these techniques with MIT Lincoln Laboratory achieving a 4096 display in 1950. A color-scanned display was implemented in the late 1960s, called the Brookhaven RAster Display (BRAD), which used a drum memory and a television monitor. In 1969, A. Michael Noll of Bell Telephone Laboratories, Inc. implemented a scanned display with a frame buffer, using magnetic-core memory. A year or so later, the Bell Labs system was expanded to display an image with a color depth of three bits on a standard color TV monitor. The vector graphics used in the computer had to be converted for the scanned graphics of a TV display. In the early 1970s, the development of MOS memory (metal–oxide–semiconductor memory) integrated-circuit chips, particularly high-density DRAM (dynamic random-access memory) chips with at least 1 kb memory, made it practical to create, for the first time, a digital memory system with framebuffers capable of holding a standard video image. This led to the development of the SuperPaint system by Richard Shoup at Xerox PARC in 1972. Shoup was able to use the SuperPaint framebuffer to create an early digital video-capture system. By synchronizing the output signal to the input signal, Shoup was able to overwrite each pixel of data as it shifted in. Shoup also experimented with modifying the output signal using color tables. These color tables allowed the SuperPaint system to produce a wide variety of colors outside the range of the limited 8-bit data it contained. This scheme would later become commonplace in computer framebuffers. In 1974, Evans & Sutherland released the first commercial framebuffer, the Picture System, costing about $15,000. It was capable of producing resolutions of up to 512 by 512 pixels in 8-bit grayscale, and became a boon for graphics researchers who did not have the resources to build their own framebuffer. The New York Institute of Technology would later create the first 24-bit color system using three of the Evans & Sutherland framebuffers. Each framebuffer was connected to an RGB color output (one for red, one for green and one for blue), with a Digital Equipment Corporation PDP 11/04 minicomputer controlling the three devices as one. In 1975, the UK company Quantel produced the first commercial full-color broadcast framebuffer, the Quantel DFS 3000. It was first used in TV coverage of the 1976 Montreal Olympics to generate a picture-in-picture inset of the Olympic flaming torch while the rest of the picture featured the runner entering the stadium. The rapid improvement of integrated-circuit technology made it possible for many of the home computers of the late 1970s to contain low-color-depth framebuffers. Today, nearly all computers with graphical capabilities utilize a framebuffer for generating the video signal. Amiga computers, created in the 1980s, featured special design attention to graphics performance and included a unique Hold-And-Modify framebuffer capable of displaying 4096 colors. Framebuffers also became popular in high-end workstations and arcade system boards throughout the 1980s. SGI, Sun Microsystems, HP, DEC and IBM all released framebuffers for their workstation computers in this period. These framebuffers were usually of a much higher quality than could be found in most home computers, and were regularly used in television, printing, computer modeling and 3D graphics. Framebuffers were also used by Sega for its high-end arcade boards, which were also of a higher quality than on home computers. == Display modes == Framebuffers used in personal and home computing often had sets of defined modes under which the framebuffer can operate. These modes reconfigure the hardware to output different resolutions, color depths, memory layouts and refresh rate timings. In the world of Unix machines and operating systems, such conveniences were usually eschewed in favor of directly manipulating the hardware settings. This manipulation was far more flexible in that any resolution, color depth and refresh rate was attainable – limited only by the memory available to the framebuffer. An unfortunate side-effect of this method was that the display device could be driven beyond its capabilities. In some cases, this resulted in hardware damage to the display. More commonly, it simply produced garbled and unusable output. Modern CRT monitors fix this problem through the introduction of protection circuitry. When the display mode is changed, the monitor attempts to obtain a signal lock on the new refresh frequency. If the monitor is unable to obtain a signal lock or if the signal is outside the range of its design limitations, the monitor will ignore the framebuffer signal and possibly present the user with an error message. LCD monitors tend to contain similar protection circuitry, but for different reasons. Since the LCD must digitally sample the display signal (thereby emulating an electron beam), any signal that is out of range cannot be physically displayed on the monitor. == Color palette == Framebuffers have traditionally supported a wide variety of color modes. Due to the expense of memory, most early framebuffers used 1-bit (2 colors per pixel), 2-bit (4 colors), 4-bit (16 colors) or 8-bit (256 colors) color depths. The problem with such small color depths is that a full range of colors cannot be produced. The solution to this problem was indexed color, which adds a lookup table to the framebuffer. Each color stored in framebuffer memory acts as a color index. The lookup table serves as a palette with a limited number of different colors, while the rest is used as an index table. Here is a typical indexed 256-color image and its own palette (shown as a rectangle of swatches): In some designs, it was also possible to write data to the lookup table (or switch between existing palettes) on the fly, allowing dividing the picture into horizontal bars with their own palette and thus rendering an image that had a far wider palette. For example, viewing an outdoor shot photograph, the picture could be divided into four bars: the top one with emphasis on sky tones, the next with foliage tones, the next with skin and clothing tones, and the bottom one with ground colors. This required each palette to have overlapping colors, but, carefully done, allowed great flexibility. == Memory access == While framebuffers are commonly accessed via a memory mapping directly to the CPU memory space, this is not the only method by which they may be accessed. Framebuffers have varied widely in the methods used to access memory. Some of the most common are: Mapping the entire framebuffer to a given memory range. Port commands to set each pixel, range of pixels or palette entry. Mapping a memory range smaller than the framebuffer memory, then bank switching as necessary. The framebuffer organization may be packed pixel or planar. The framebuffer may be all

    Read more →
  • Automated journalism

    Automated journalism

    Automated journalism, also known as algorithmic journalism or robot journalism, is a term that attempts to describe modern technological processes that are now in use in the journalistic profession, such as news articles and videos generated by computer programs. There are four main fields of application for automated journalism, namely automated content production, data mining, news dissemination and content optimization. Through generative artificial intelligence, stories are produced automatically by computers rather than human reporters. In the 2020s, generative pre-trained transformers have enabled the generation of articles, simply by providing prompts. Automated journalism is sometimes seen as an opportunity to free journalists from routine reporting, providing them with more time for complex tasks. It also allows efficiency and cost-cutting, alleviating some financial burden that many news organizations face. However, automated journalism is also perceived as a threat to the authorship and quality of news and a threat to the livelihoods of human journalists. == History == Historically, the process involved an algorithm that scanned large amounts of provided data, selected from an assortment of pre-programmed article structures, ordered key points, and inserted details such as names, places, amounts, rankings, statistics, and other figures. These programs interpret, organize, and present data in human-readable ways. The output can also be customized to fit a certain voice, tone, or style. Early implementations were mainly used for stories based on statistics and numerical figures. Common topics include sports recaps, weather, financial reports, real estate analysis, and earnings reviews. Data science and AI companies such as Automated Insights, Narrative Science, United Robots and Monok develop and provide these algorithms to news outlets. In 2016, early adopters included news providers such as the Associated Press, Forbes, ProPublica, and the Los Angeles Times. StatSheet, an online platform covering college basketball, runs entirely on an automated program. In 2006, Thomson Reuters announced their switch to automation to generate financial news stories on its online news platform. Reuters used a tool called Tracer. An algorithm called Quakebot published a story about a 2014 California earthquake on The Los Angeles Times website within three minutes after the shaking had stopped. The Associated Press began using automation to cover 10,000 minor baseball leagues games annually, using a program from Automated Insights and statistics from MLB Advanced Media. Outside of sports, the Associated Press also uses automation to produce stories on corporate earnings. Since 2014, Associated Press has been publishing quarterly financial stories with help from Automated Insights. In May 2020, Microsoft announced that a number of its MSN contract journalists would be replaced by robot journalism. On 8 September 2020, The Guardian published an article entirely written by the neural network GPT-3, although the published fragments were manually picked by a human editor. Agentic Tribune produces all of its news articles automatically using AI. News broadcasters in Kuwait, Greece, South Korea, India, China and Taiwan have presented news with anchors based on generative AI models, prompting concerns about job losses for human anchors and audience trust in news that has historically been influenced by parasocial relationships with broadcasters, content creators or social media influencers. Algorithmically generated anchors have also been used by allies of ISIS for their broadcasts. In 2023, Google reportedly pitched a tool to news outlets that claimed to "produce news stories" based on input data provided, such as "details of current events". Some news company executives who viewed the pitch described it as "[taking] for granted the effort that went into producing accurate and artful news stories." In February 2024, Google launched a program to pay small publishers to write three articles per day using a beta generative AI model. The program does not require the knowledge or consent of the websites that the publishers are using as sources, nor does it require the published articles to be labeled as being created or assisted by these models. Meta AI, a chatbot based on Llama 3 which summarizes news stories, was noted by The Washington Post to copy sentences from those stories without direct attribution and to potentially further decrease the traffic of online news outlets. == Benefits == === Speed === Robot reporters are built to produce large quantities of information at quicker speeds. The Associated Press announced that their use of automation has increased the volume of earnings reports from customers by more than ten times. With software from Automated Insights and data from other companies, they can produce 150 to 300-word articles in the same time it takes journalists to crunch numbers and prepare information. By automating routine stories and tasks, journalists are promised more time for complex jobs such as investigative reporting and in-depth analysis of events. Francesco Marconi of the Associated Press stated that, through automation, the news agency freed up 20 percent of reporters’ time to focus on higher-impact projects. This has also been stated by a spokesperson at Gannett, who stated "By leveraging AI, we are able to expand coverage and enable our journalists to focus on more in-depth reporting." GBH reports that AI tools help increase the reach of news publishers. Mike Carragi, a product manager at Patch, stated that they were able to increase their reach from 1200 communities to 7000 communities in just a few months without the need for new employees solely through the adoption of generative AI. In fact, many communities are served solely by AI generated content, which creates summaries of existing information within the community. === Cost === Automated journalism is cheaper because more content can be produced within less time. It also lowers labour costs for news organizations. Reduced human input means less expenses on wages or salaries, paid leaves, vacations, and employment insurance. Automation serves as a cost-cutting tool for news outlets struggling with tight budgets but still wish to maintain the scope and quality of their coverage. == Concerns == === Authorship === In an automated story, there is often confusion about who should be credited as the author. Several participants of a study on algorithmic authorship attributed the credit to the programmer; others perceived the news organization as the author, emphasizing the collaborative nature of the work. There is also no way for the reader to verify whether an article was written by a robot or human, which raises issues of transparency although such issues also arise with respect to authorship attribution between human authors too. === Credibility and quality === Concerns about the perceived credibility of automated news is similar to concerns about the perceived credibility of news in general. Critics doubt if algorithms are "fair and accurate, free from subjectivity, error, or attempted influence." Again, these issues about fairness, accuracy, subjectivity, error, and attempts at influence or propaganda has also been present in articles written by humans over thousands of years. A common criticism is that machines do not replace human capabilities such as creativity, humour, and critical-thinking. However, as the technology evolves, the aim is to mimic human characteristics. When the UK's Guardian newspaper used an AI to write an entire article in September 2020, commentators pointed out that the AI still relied on human editorial content. Austin Tanney, the head of AI at Kainos said: "The Guardian got three or four different articles and spliced them together. They also gave it the opening paragraph. It doesn’t belittle what it is. It was written by AI, but there was human editorial on that." The largest single study of readers' evaluations of news articles produced with and without the help of automation exposed 3,135 online news consumers to 24 articles. It found articles that had been automated were significantly less comprehensible, in part because they were considered to contain too many numbers. However, the automated articles were evaluated equally on other criteria including tone, narrative flow, and narrative structure. Beyond human evaluation, there are now numerous algorithmic methods to identify machine written articles although some articles may still contain errors that are obvious for a human to identify, they can at times score better with these automatic identifiers than human-written articles. A 2017 Nieman Reports article by Nicola Bruno discusses whether or not machines will replace journalists and addresses concerns around the concept of automated journalism practices. Ultimately, Bruno came to the conclusion that AI would assist journalist

    Read more →
  • Evidence-based library and information practice

    Evidence-based library and information practice

    Evidence-based library and information practice (EBLIP) or evidence-based librarianship (EBL) is the use of evidence-based practices (EBP) in the field of library and information science (LIS). This means that all practical decisions made within LIS should 1) be based on research studies and 2) that these research studies are selected and interpreted according to some specific norms characteristic for EBP. Typically such norms disregard theoretical studies and qualitative studies and consider quantitative studies according to a narrow set of criteria of what counts as evidence. If such a narrow set of methodological criteria are not applied, it is better instead to speak of research based library and information practice. == Characteristics == Evidence-based practice in general has been characterised as a positivist approach; EBLIP is therefore also a positivist approach to LIS. As such, EBLIP is an approach in contrast to other approaches to LIS. The use of statistical approaches known as meta-analysis to conclude what evidence has been reported in the literature is one among other methods which is typical for the evidence-based approach. In 2002, Booth noted the three schools of EBILP had some commonalities, including the context of day-to-day decision-making, an emphasis on improving the quality of professional practice, a pragmatic focus on the 'best available evidence', incorporation of the user perspective, the acceptance of a broad range of quantitative and qualitative research designs, and access, either first-hand or second-hand, to the (process of) evidence-based practice and its products. He added one more, that EBILP is concerned with getting the best value for money. == The role of library and information science in EBP == Evidence-based practice in general is based on a very thorough search of the scientific literature and a very thorough selection and analysis of the retrieved literature. A close familiarity with database searching is needed, and library and information professionals have important roles to play in this respect. Therefore LIS professionals should be well suited to help professionals in other disciplines doing EBP. EBLIP is the application of this approach on LIS itself. It should be mentioned, however, that EBP started in medicine as evidence-based medicine (EBM) from which it spread to other fields. Only slowly and to a limited extent has EBP moved on to LIS. The EBLIP process can be applied to a variety of scenarios in LIS, including customer service, collection development, library management and information literacy instruction. In general, quantitative methods are used in LIS research. A 2010 study revealed five categories that capture the different ways library and information professionals experience evidence-based practice: Evidence-based practice is experienced as irrelevant; Evidence-based practice is experienced as learning from published research; Evidence-based practice is experienced as service improvement; Evidence-based practice is experienced as a way of being; Evidence-based practice is experienced as a weapon.

    Read more →
  • E-Science librarianship

    E-Science librarianship

    E-Science librarianship refers to a role for librarians in e-Science. == Early scholars == Early references to e-Science and librarianship involve information studies scholars researching cyberinfrastructure and emerging networked information and knowledge communities. Notably Christine Borgman, Professor and Presidential Chair in Information Studies at the University of California, Los Angeles (UCLA) was a key player in bringing e-Science, and the idea of networked knowledge communities, to the attention of the library profession. In 2004, as a visiting fellow at the Oxford Internet Institute, she conducted research and lectured publicly on e-Science, Digital Libraries, and Knowledge Communities. In 2007 Anna K. Gold, formerly of MIT and Cal Poly, San Luis Obispo, authored a series of articles in D-Lib Magazine that opened the door for academic libraries to begin exploring roles, skills, and strategies for engaging in e-Science: Cyberinfrastructure, Data, and Libraries, Part 1: A Cyberinfrastructure Primer for Librarians and Cyberinfrastructure, Data, and Libraries, Part 2: Libraries and the Data Challenge: Roles and Actions for Libraries. == Academic research and health sciences libraries == In 2007, the Association of Research Libraries (ARL) e-Science task force issued its report on e-Science and librarianship. The ARL's report encouraged its member libraries to position themselves to engage with researchers involved in e-Science (eScience) by cultivating new research support strategies and developing their digital scholarship infrastructure. E-Science has multiple attributes; Tony and Jessie Hey framed e-Science for the library community by characterizing it as a research methodology: "e-Science is not a new scientific discipline in its own right: e-Science is shorthand for the set of tools and technologies required to support collaborative, networked science". In addition to academic libraries' interests in providing support for their researchers engaging in e-Science, the health sciences library community also emerged as a major proponent for creating librarian positions for supporting the information needs of large-scale, networked, research collaborations on their campuses. Neil Rambo, current director of NYU's Health Sciences Library and former director of University of Washington Health Sciences Library, was the first to use the term in the Journal of the Medical Library Association, in his 2009 editorial e-Science and the Biomedical Library. Rambo's definition of e-Science highlighted the potential e-Science held for creating data as a research product: "E-science is a new research methodology, fueled by networked capabilities and the practical possibility of gathering and storing vast amounts of data." In response to this article the University of Massachusetts Medical School Lamar Soutter Library and National Network of Libraries of Medicine, New England Region encouraged health sciences libraries to cooperate to identify skills and develop a program for training e-Science Librarians. Then, in 2013, Shannon Bohle, an archivist who was employed in the library at Cold Spring Harbor Laboratory, an NCI-designated basic cancer research facility, used experience gained there and previous papers and presentations about preserving scientific archival materials to expand the traditional definition of e-Science by including the terms, principles, and practices used in archival science. These included in the definition the "long-term storage and accessibility of all materials generated through the scientific process," as well as examples of material types traditionally preserved in archives, like "electronic/digitized laboratory notebooks, raw and fitted data sets, manuscript production and draft versions, pre-prints," as well as library materials ("print and/or electronic publications"). == Roles == Many areas of science are about to be transformed by the availability of vast amounts of new scientific data that can potentially provide insights at a level of detail never before envisaged. However, this new data dominant era brings new challenges for the scientists and they will need the skills and technologies both of computer scientists and of the library community to manage, search and curate these new data resources. Libraries will not be immune from change in this new world of research. Karen Williams identifies roles in the following areas for librarians in the developing world of e-Science. Campus Engagement Content/Collection Development and Management Teaching and Learning Scholarly Communication E-Scholarship and Digital Tools Reference/Help Services Outreach Fund Raising Exhibit and Event Planning Leadership == Challenges for research libraries == E-science tends toward inter- and multidisciplinary approaches that depend on computation and computer science. Research libraries have traditionally been discipline focused and, although increasingly technologically sophisticated, do not have systems of the scale or complexity of the e-science environment. E-science is data intensive, but research libraries have not typically been responsible for scientific data. E-science is frequently conducted in a team context, often distributed across multiple institutions and on a global scale. The primary constituency of libraries generally comprises those affiliated with the local institution. Licenses for electronic content are typically restricted to a particular institutional community, and the infrastructure to move institutional licenses into a multi-institutional environment is not well developed. E-science challenges all these traditional paradigms of research library organization and services. == Skills == Garritano & Carlson were among the first to outline a skill set for librarians seeking to support the data needs of e-Science; they identified five skill categories librarians new to this area should expect to adapt or develop when participating on such projects: Library and information science expertise Subject expertise Partnerships and outreach (both internal and external) Participating in sponsored research Balancing workload An example of librarians reconfiguring traditional librarian skills to meet the needs of researchers engaging in e-Science is Witt & Carlson's adaptation of the traditional reference interview into a "data interview" in order to provide effective data management and e-Science services. This interview consists of ten practical queries necessary for understanding the provenance and expectations for the preservation of datasets typical of e-Science that also help illustrate some of the educational tools and skills needed by a librarian new to e-Science. "What is the story of the data? What form and format are the data in? What is the expected lifespan of the dataset? How could the data be used, reused, and repurposed? How large is the dataset, and what is its rate of growth? Who are the potential audiences for the data? Who owns the data? Does the dataset include any sensitive information? What publications or discoveries have resulted from the data? How should the data be made accessible?" == Resources == In 2009 the Lamar Soutter Library at the University of Massachusetts Medical School (UMMS) and the National Network of Libraries of Medicine, New England Region (NN/LM NER) funded an e-Science program for building the skills highlighted above for librarians. Elaine Russo Martin, Director of Library Services at the Lamar Soutter Library and Director of the NN/LM NER developed this comprehensive e-Science program to build librarians' subject expertise in the sciences, developing their data management skills, and their familiarity with cyberinfrastructure and e-Science. Three major products of this program are the e-Science web portal for librarians, the E-Science Symposium, and the New England Collaborative Data Management Curriculum (NECDMC). This portal includes educational resources for specific tools and subject/discipline tutorials and modules to assist librarians new to e-Science. UMMS and NN/LM NER also publish an open access journal called the Journal of eScience Librarianship.

    Read more →
  • Tweak programming environment

    Tweak programming environment

    Tweak is a graphical user interface (GUI) layer written by Andreas Raab for the Squeak development environment, which in turn is an integrated development environment based on the Smalltalk-80 computer programming language. Tweak is an alternative to an earlier graphic user interface layer called Morphic. Development began in 2001. Applications that use the Tweak software include Sophie (version 1), a multimedia and e-book authoring system, and a family of virtual world systems: Open Cobalt, Teleplace, OpenQwaq, 3d ICC's Immersive Terf and the Croquet Project. == Influences == An experimental version of Etoys, a programming environment for children, used Tweak instead of Morphic. Etoys was a major influence on a similar Squeak-based programming environment known as Scratch.

    Read more →
  • Webometrics Ranking of Business Schools

    Webometrics Ranking of Business Schools

    The Webometrics Ranking of Business Schools, also known as Ranking Web of Business Schools, is a ranking system for the world's business schools based on a composite indicator that takes into account both the volume of the Web content (number of web pages and files) and the visibility and impact of these web publications according to the number of external inlinks (site citations) they received. The ranking is published by the Cybermetrics Lab, a research group of the Spanish National Research Council (CSIC) located in Madrid. This ranking was discontinued in 2013 and is no longer updated. This discontinued ranking is, however, often cited (as of 2017-06-16) by Google as its main ranking reference. Examples are: "Spain business school ranking " = "Zurich business school ranking" etc. The Webometrics Ranking of World Universities is a similar ranking of universities.

    Read more →
  • Terminology extraction

    Terminology extraction

    Terminology extraction (also known as term extraction, glossary extraction, term recognition, or terminology mining) is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus. In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc. The development of terminology extraction is also essential to the language industry. One of the first steps to model a knowledge domain is to collect a vocabulary of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts. Several methods to automatically extract technical terms from domain-specific document warehouses have been described in the literature. Typically, approaches to automatic term extraction make use of linguistic processors (part of speech tagging, phrase chunking) to extract terminological candidates, i.e. syntactically plausible terminological noun phrases. Noun phrases include compounds (e.g. "credit card"), adjective noun phrases (e.g. "local tourist information office"), and prepositional noun phrases (e.g. "board of directors"). In English, the first two (compounds and adjective noun phrases) are the most frequent. Terminological entries are then filtered from the candidate list using statistical and machine learning methods. Once filtered, because of their low ambiguity and high specificity, these terms are particularly useful for conceptualizing a knowledge domain or for supporting the creation of a domain ontology or a terminology base. Furthermore, terminology extraction is a very useful starting point for semantic similarity, knowledge management, human translation and machine translation, etc. == Bilingual terminology extraction == The methods for terminology extraction can be applied to parallel corpora. Combined with e.g. co-occurrence statistics, candidates for term translations can be obtained. Bilingual terminology can be extracted also from comparable corpora (corpora containing texts within the same text type, domain but not translations of documents between each other).

    Read more →
  • Algorithm

    Algorithm

    In mathematics and computer science, an algorithm ( ) is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing calculations and data processing. More advanced algorithms can use conditionals to divert the code execution through various routes (referred to as automated decision-making) and deduce valid inferences (referred to as automated reasoning). In contrast, a heuristic is an approach to solving problems without well-defined correct or optimal results. For example, although social media recommender systems are commonly called "algorithms", they actually rely on heuristics as there is no truly "correct" recommendation. As an effective method, an algorithm can be expressed within a finite amount of space and time and in a well-defined formal language for calculating a function. Starting from an initial state and input, a computation occurs at each step, eventually producing output and terminating. The transition between states can be non-deterministic; randomized algorithms incorporate random input. == Etymology == Around 825 AD, Persian scientist and polymath Muḥammad ibn Mūsā al-Khwārizmī wrote kitāb al-ḥisāb al-hindī ("Book of Indian computation") and kitab al-jam' wa'l-tafriq al-ḥisāb al-hindī ("Addition and subtraction in Indian arithmetic"). In the early 12th century, Latin translations of these texts involving the Hindu–Arabic numeral system and arithmetic appeared, for example Liber Alghoarismi de practica arismetrice, attributed to John of Seville, and Liber Algoritmi de numero Indorum, attributed to Adelard of Bath. Here, alghoarismi or algoritmi is the Latinization of Al-Khwarizmi's name; the text starts with the phrase Dixit Algoritmi, or "Thus spoke Al-Khwarizmi". The word algorism in English came to mean the use of place-value notation in calculations; it occurs in the Ancrene Wisse from circa 1225. By the time Geoffrey Chaucer wrote The Canterbury Tales in the late 14th century, he used a variant of the same word in describing augrym stones, stones used for place-value calculation. In the 15th century, under the influence of the Greek word ἀριθμός (arithmos, "number"; cf. "arithmetic"), the Latin word was altered to algorithmus. By 1596, this form of the word was used in English, as algorithm, by Thomas Hood. == Definition == One informal definition is "a set of rules that precisely defines a sequence of operations", which would include all computer programs, and any bureaucratic procedure or cook-book recipe. In general, a program is an algorithm only if it stops eventually. Formally, algorithm is an explicit set of instructions to produce an output, that can be followed by a computer or a human performing specific operations on symbols.. == History == === Ancient algorithms === Step-by-step procedures for solving mathematical problems have been recorded since antiquity. This includes in Babylonian mathematics (around 2500 BC), Egyptian mathematics (around 1550 BC), Indian mathematics (around 800 BC and later), the Ifa Oracle (around 500 BC), Greek mathematics (around 240 BC), Chinese mathematics (around 200 BC and later), and Arabic mathematics (around 800 AD). The earliest evidence of algorithms is found in ancient Mesopotamian mathematics. A Sumerian clay tablet found in Shuruppak near Baghdad and dated to c. 2500 BC describes the earliest division algorithm. During the Hammurabi dynasty c. 1800 – c. 1600 BC, Babylonian clay tablets described algorithms for computing formulas. Algorithms were also used in Babylonian astronomy. Babylonian clay tablets describe and employ algorithmic procedures to compute the time and place of significant astronomical events. Algorithms for arithmetic are also found in ancient Egyptian mathematics, dating back to the Rhind Mathematical Papyrus c. 1550 BC. Algorithms were later used in ancient Hellenistic mathematics. Two examples are the Sieve of Eratosthenes, which was described in the Introduction to Arithmetic by Nicomachus, and the Euclidean algorithm, which was first described in Euclid's Elements (c. 300 BC).Examples of ancient Indian mathematics included the Shulba Sutras, the Kerala School, and the Brāhmasphuṭasiddhānta. In the 9th century, Muḥammad ibn Mūsā al-Khwārizmī revolutionized the field by establishing the algorithm as a systematic, finite sequence of logical steps to solve mathematical problems. In his influential work, The Compendious Book on Calculation by Completion and Balancing, he moved beyond specific numerical solutions to introduce general procedures for algebraic reduction and balancing. This transformed mathematics into a 'mechanical' process of well-defined rules—a fundamental shift that laid the groundwork for modern algorithmic theory. The Latin translation of his arithmetic treatise, titled Algoritmi de numero Indorum, led to the term algorithm being derived from the Latinization of his name, Algoritmi, specifically to describe this new rule-based approach to mathematics. The first cryptographic algorithm for deciphering encrypted code was developed by Al-Kindi, a 9th-century Arab mathematician, in A Manuscript On Deciphering Cryptographic Messages. He gave the first description of cryptanalysis by frequency analysis, the earliest codebreaking algorithm. === Computers === ==== Weight-driven clocks ==== Weight-driven clocks were a key European invention in Middle Ages, specifically the verge escapement mechanism producing the tick of mechanical clocks. Accurate automatic machines led to mechanical automata in the 13th century and computational machines—the difference and analytical engines of Charles Babbage and Ada Lovelace in the mid-19th century. Lovelace designed the first algorithm intended for a computer, Babbage's analytical engine, the first real Turing-complete computer, more than the mechanical calculators of the time. Although the full implementation of Babbage's second device was only built decades after her lifetime, Lovelace has been called "history's first programmer". ==== Electromechanical relay ==== The Jacquard loom, a precursor to punch cards, and telephone switching machines led to the development of the first computers. By the mid-19th century, the telegraph, was in use throughout the world. By the late 19th century, ticker tape (c. 1870s) and punch cards (c. 1890) were developed. Then came the teleprinter (c. 1910) with its punched-paper use of Baudot code on tape. Telephone-switching networks of electromechanical relays were invented in 1835. These led to the invention of the digital adding device by George Stibitz in 1937. While working in Bell Laboratories, he observed the "burdensome" use of mechanical calculators with gears, prompting him to experiment create an experimental digital adder at home. === Formalization === In 1928, a partial formalization of the modern concept of algorithms began with attempts to solve David Hilbert's Entscheidungsproblem (decision problem). Later formalizations were framed as attempts to define "effective calculability" or "effective method". Those formalizations included the Gödel–Herbrand–Kleene recursive functions of 1930, 1934 and 1935, Alonzo Church's lambda calculus of 1936, Emil Post's Formulation 1 of 1936, and Alan Turing's Turing machines of 1936–37 and 1939. === Modern Algorithms === For decades, it was assumed that algorithm evolution progresses from heuristics to formal algorithms. A Symbolic integration provides a classic illustration. In 1961, James Slagle’s program SAINT used heuristics to solve 52 of 54 freshman calculus exercises from an MIT textbook (≈96%). In 1967, Larry Moses’s SIN refined the heuristics and achieved 100% success, though it remained heuristic. Finally, in 1969, Robert Risch introduced the Risch Algorithm with formal guarantees. This trajectory defined the traditional path: heuristics evolving until a definitive, guaranteed algorithm emerged. However, the rise of transformer-based AI has inverted this sequence — classical algorithms are now being displaced by heuristics once again. Algorithms have evolved and improved in many ways as time goes on. Common uses of algorithms today include social media apps like Instagram and YouTube. Algorithms are used as a way to analyze what people like and push more of those things to the people who interact with them. Quantum computing uses quantum algorithm procedures to solve problems faster. More recently, in 2024, NIST updated their post-quantum encryption standards, which includes new encryption algorithms to enhance defenses against attacks using quantum computing. == Representations == Algorithms can be expressed in many kinds of notation, including natural languages, pseudocode, flowcharts, drakon-charts, programming languages or control tables. Natural language expressions of algorithms tend to be verbose and ambiguous and are rarely used for complex or technical algor

    Read more →
  • Lawbot

    Lawbot

    Lawbots are a broad class of customer-facing legal AI applications that are used to automate specific legal tasks, such as document automation and legal research. The terms robot lawyer and lawyer bot are used as synonyms to lawbot. A robot lawyer or a robo-lawyer refers to a legal AI application that can perform tasks that are typically done by paralegals or young associates at law firms. However, there is some debate on the correctness of the term. Some commentators say that legal AI is technically speaking neither a lawyer nor a robot and should not be referred to as such. Other commentators believe that the term can be misleading and note that the robot lawyer of the future will not be one all-encompassing application but a collection of specialized bots for various tasks. Lawbots use various artificial intelligence techniques or other intelligent systems to limit humans' direct ongoing involvement in certain steps of a legal matter. The user interfaces on lawbots vary from smart searches and step-by-step forms to chatbots. Consumer and enterprise-facing lawbot solutions often do not require direct supervision from a legal professional. Depending on the task, some client-facing solutions used at law firms operate under an attorney supervision. == Levels of autonomy == The following levels of autonomy (LoA) are suggested for automated AI legal reasoning: Level 0 (LoA0): No automation for AI legal reasoning Level 1 (LoA1): Simple assistance automation Level 2 (LoA2): Advanced assistance automation Level 3 (LoA3): Semi-autonomous automation Level 4 (LoA4): Domain automation Level 5 (LoA5): Fully-autonomous automation Level 6 (LoA6): Superhuman automation == Examples == Some legal AI solutions are developed and marketed directly to the customers or consumers, whereas other applications are tools for the attorneys at law firms. There are already hundreds of legal AI solutions that operate in multitude of ways varying in sophistication and dependence on scripted algorithms. One notable legal technology chatbot application is DoNotPay. It had started off as an app for contesting parking tickets, but has since expanded to include features that help users with many different types of legal issues, ranging from consumer protection to immigration rights and other social issues. == Impact on the legal industry == In the 2016 report, Deloitte estimated that more than 110,000 law jobs in just the United Kingdom alone could disappear within the next twenty years due to automation. This change could result in the creation of more highly skilled jobs and in the reduction of paralegal and temporary positions. Deloitte's report asserts that "there is significant potential for high-skilled roles that involve repetitive processes to be automated by smart and self-learning algorithms". According to Lawyers to Engage, between 22% of a lawyer’s work and 35% of a legal assistant’s work can be automated in the US. Top law schools like Harvard have already begun to integrate Artificial Intelligence into the curriculum. Legal tech start-up companies have begun developing applications that assist law firms with completing low-risk legal processes. These applications can enable lawyers to focus on more work that requires their specific expertise. The automation of processes like contract reviewing, enforcement of negotiations (smart contracts) and client intake (expert systems) allows law firms to streamline their procedures and improve efficiency. In addition, automation benefits small-to-medium law firms that do not have the resources to utilize junior talent on such routine tasks. The increase of law firms utilizing automated applications could result into legal tech becoming a necessity in the industry. Digital Reason CEO, Tim Estes, stated that those who refuse the opportunity to integrate AI in their workflow are “most at risk.” In 2018, Forbes reported a 713% increase in investments in legal tech. This rapid growth is reflective of law firms beginning to “cede business to… new model legal providers… that meld technological, business and legal expertise.” == Access to law and justice == It has been widely estimated for at least the last generation that all the programs and resources devoted to ensuring access to justice address only 20% of the civil legal needs of low-income people in the United States. Drawing on this experience, in late 2011, the U.S. government-funded Legal Services Corporation decided to convene a summit of leaders to explore how best to use technology in the access-to-justice community. The group adopted a mission for The Summit on the Use of Technology to Expand Access to Justice (Summit) consistent with the magnitude of the challenge: "to explore the potential of technology to move the United States toward providing some form of effective assistance to 100% of persons otherwise unable to afford an attorney for dealing with essential civil legal needs". In April 2017, joined by Microsoft and Pro Bono Net, the Legal Services Corporation (LSC) announced a pilot program to develop online, statewide legal portals to direct individuals with civil legal needs to the most appropriate forms of assistance. == Technological limitations == Current research in subjects such as computational privacy, explainable machine learning, Bayesian deep learning, knowledge-intensive machine learning, and transfer learning reveals that we do not yet have the technology to enable Level 4 to 6 AI lawbots. In 2023, OpenLaw began developing a model called Law Bot, which interacts in a conversational way as an attorney. The dialogue format makes it possible for Law Bot to answer follow-up questions, challenge incorrect premises, and reject inappropriate requests. Currently, they try to ensure it is in full compliance with all laws and regulations while conducting further beta testing before releasing it to the general public.

    Read more →
  • Authoritative Legal Entity Identifier

    Authoritative Legal Entity Identifier

    An Authoritative Legal Entity Identifier (ALEI) is the identifier assigned by a government jurisdiction authorized by statute or decree to create a legal entity and to maintain the authoritative registries of legal entities. ALEIs are used within supply chain data, ERP applications and master data management systems to support accurate and consistent identification of entities in digital records, supply chains, and government databases. ALEIs are described in the international standard ISO 8000-116, which outlines a structured format that makes the locally unique identifier into a globally unique one and ensures global interoperability and data quality. == Structure == An ALEI is composed of three main components: a prefix that identifies the jurisdiction and register, a subdomain element (optional), and the local registration number of the entity. For example, the identifier "US-DE.BER:3031657" refers to an entity registered in the Delaware Business Entity Register in the United States. The standardization of this structure is governed by ISO 8000-116, which is designed to ensure each ALEI is globally unique and resolvable. == Comparison with other identifiers == ALEIs differ from proxy identifiers such as the DUNS number, NCAGE code, or the Legal Entity Identifier (LEI) managed by GLEIF. While proxy identifiers can be issued by institutions that do not create legal entities, ALEIs are created and maintained by public bodies with the authority to form and register legal entities. This authoritative origin makes ALEIs particularly suitable for applications involving legal traceability, government regulation, and international transparency efforts. == Usage == ALEIs are increasingly utilized to identify legal entities in public and private datasets. The identifiers support supply chain accuracy, regulatory compliance, and the unification of master data. The first practical implementation of an ALEI was the International Business Registration Number (IBRN), developed to provide globally unique identifiers for registered business entities. IBRNs are issued by authorized government jurisdictions and are used to verify entities across borders, particularly in the context of trade facilitation and data exchange systems. For instance, business directories and registration systems in U.S. states like Connecticut provide structured registration documents that can be used to verify the ALEIs they issue. The use of ALEIs has been recommended by international organizations such as the Extractive Industries Transparency Initiative (EITI) and Open ownership to improve beneficial ownership registries. == Policy and regulation == ALEIs have been referenced in policy consultations such as those related to the U.S. Financial Data Transparency Act. Federal institutions including the Federal Reserve and FDIC have examined the potential for ALEIs to unify entity identification across regulatory databases.

    Read more →
  • NoSQL

    NoSQL

    NoSQL (originally meaning "not only SQL" or "non-relational") refers to a type of database design that stores and retrieves data differently from the traditional table-based structure of relational databases. Unlike relational databases, which organize data into rows and columns like a spreadsheet, NoSQL databases use a single data structure—such as key–value pairs, wide columns, graphs, or documents—to hold information. Since this non-relational design does not require a fixed schema, it scales easily to manage large, often unstructured datasets. NoSQL systems are sometimes called "Not only SQL" because they can support SQL-like query languages or work alongside SQL databases in polyglot-persistent setups, where multiple database types are combined. Non-relational databases date back to the late 1960s, but the term "NoSQL" emerged in the early 2000s, spurred by the needs of Web 2.0 companies like social media platforms. NoSQL databases are popular in big data and real-time web applications due to their simple design, ability to scale across clusters of machines (called horizontal scaling), and precise control over data availability. These structures can speed up certain tasks and are often considered more adaptable than fixed database tables. However, many NoSQL systems prioritize speed and availability over strict consistency (per the CAP theorem), using eventual consistency—where updates reach all nodes eventually, typically within milliseconds, but may cause brief delays in accessing the latest data, known as stale reads. While most lack full ACID transaction support, some, like MongoDB, include it as a key feature. == Barriers to adoption == Barriers to wider NoSQL adoption include their use of low-level query languages instead of SQL, inability to perform ad hoc joins across tables, lack of standardized interfaces, and significant investments already made in relational databases. Some NoSQL systems risk losing data through lost writes or other forms, though features like write-ahead logging—a method to record changes before they’re applied—can help prevent this. For distributed transaction processing across multiple databases, keeping data consistent is a challenge for both NoSQL and relational systems, as relational databases cannot enforce rules linking separate databases, and few systems support both ACID transactions and X/Open XA standards for managing distributed updates. Limitations within the interface environment are overcome using semantic virtualization protocols, such that NoSQL services are accessible to most operating systems. == History == The term NoSQL was used by Carlo Strozzi in 1998 to name his lightweight Strozzi NoSQL open-source relational database that did not expose the standard Structured Query Language (SQL) interface, but was still relational. His NoSQL RDBMS is distinct from the around-2009 general concept of NoSQL databases. Strozzi suggests that, because the current NoSQL movement "departs from the relational model altogether, it should therefore have been called more appropriately 'NoREL'", referring to "not relational". Johan Oskarsson, then a developer at Last.fm, reintroduced the term NoSQL in early 2009 when he organized an event to discuss "open-source distributed, non-relational databases". The name attempted to label the emergence of an increasing number of non-relational, distributed data stores, including open source clones of Google's Bigtable/MapReduce and Amazon's DynamoDB. == Types and examples == There are various ways to classify NoSQL databases, with different categories and subcategories, some of which overlap. What follows is a non-exhaustive classification by data model, with examples: === Key–value store === Key–value (KV) stores use the associative array (also called a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key–value pairs, such that each possible key appears at most once in the collection. The key–value model is one of the simplest non-trivial data models, and richer data models are often implemented as an extension of it. The key–value model can be extended to a discretely ordered model that maintains keys in lexicographic order. This extension is computationally powerful, in that it can efficiently retrieve selective key ranges. Key–value stores can use consistency models ranging from eventual consistency to serializability. Some databases support ordering of keys. There are various hardware implementations, and some users store data in memory (RAM), while others on solid-state drives (SSD) or rotating disks (aka hard disk drive (HDD)). === Document store === The central concept of a document store is that of a "document". While the details of this definition differ among document-oriented databases, they all assume that documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, and JSON and binary forms like BSON. Documents are addressed in the database via a unique key that represents that document. Another defining characteristic of a document-oriented database is an API or query language to retrieve documents based on their contents. Different implementations offer different ways of organizing and/or grouping documents: Collections Tags Non-visible metadata Directory hierarchies Compared to relational databases, collections could be considered analogous to tables and documents analogous to records. But they are different – every record in a table has the same sequence of fields, while documents in a collection may have fields that are completely different. === Graph === Graph databases are designed for data whose relations are well represented as a graph consisting of elements connected by a finite number of relations. Examples of data include social relations, public transport links, road maps, network topologies, etc. Graph databases and their query language == Performance == The performance of NoSQL databases is usually evaluated using the metric of throughput, which is measured as operations per second. Performance evaluation must pay attention to the right benchmarks such as production configurations, parameters of the databases, anticipated data volume, and concurrent user workloads. Ben Scofield rated different categories of NoSQL databases as follows: Performance and scalability comparisons are most commonly done using the YCSB benchmark. == Handling relational data == Since most NoSQL databases lack ability for joins in queries, the database schema generally needs to be designed differently. There are three main techniques for handling relational data in a NoSQL database. (See table join and ACID support for NoSQL databases that support joins.) === Multiple queries === Instead of retrieving all the data with one query, it is common to do several queries to get the desired data. NoSQL queries are often faster than traditional SQL queries, so the cost of additional queries may be acceptable. If an excessive number of queries would be necessary, one of the other two approaches is more appropriate. === Caching, replication and non-normalized data === Instead of only storing foreign keys, it is common to store actual foreign values along with the model's data. For example, each blog comment might include the username in addition to a user id, thus providing easy access to the username without requiring another lookup. When a username changes, however, this will now need to be changed in many places in the database. Thus this approach works better when reads are much more common than writes. === Nesting data === With document databases like MongoDB it is common to put more data in a smaller number of collections. For example, in a blogging application, one might choose to store comments within the blog post document, so that with a single retrieval one gets all the comments. Thus in this approach a single document contains all the data needed for a specific task. == ACID and join support == A database is marked as supporting ACID properties (atomicity, consistency, isolation, durability) or join operations if the documentation for the database makes that claim. However, this doesn't necessarily mean that the capability is fully supported in a manner similar to most SQL databases. == Query optimization and indexing in NoSQL databases == Different NoSQL databases, such as DynamoDB, MongoDB, Cassandra, Couchbase, HBase, and Redis, exhibit varying behaviors when querying non-indexed fields. Many perform full-table or collection scans for such queries, applying filtering operations after retrieving data. However, modern NoSQL databases often incorporate advanced features to optimize query performance. For example, MongoDB supports compound indexes and query-optimization strategies, Cassandra offers secondary indexes and materialized views, and Redis employs custom indexing mechanisms tailored to specific use cases. Systems like El

    Read more →
  • MarkLogic Server

    MarkLogic Server

    MarkLogic Server is a document-oriented database developed by MarkLogic. It is a NoSQL multi-model database that evolved from an XML database to natively store JSON documents and RDF triples, the data model for semantics. MarkLogic is designed to be a data hub for operational and analytical data. == History == MarkLogic Server was built to address shortcomings with existing search and data products. The product first focused on using XML as the document markup standard and XQuery as the query standard for accessing collections of documents up to hundreds of terabytes in size. Currently the MarkLogic platform is widely used in publishing, government, finance and other sectors. MarkLogic's customers are mostly Global 2000 companies. == Technology == MarkLogic uses documents without upfront schemas to maintain a flexible data model. In addition to having a flexible data model, MarkLogic uses a distributed, scale-out architecture that can handle hundreds of billions of documents and hundreds of terabytes of data. It has received Common Criteria certification, and has high availability and disaster recovery. MarkLogic is designed to run on-premises and within public or private cloud environments like Amazon Web Services. == Features == Indexing MarkLogic indexes the content and structure of documents including words, phrases, relationships, and values in over 200 languages with tokenization, collation, and stemming for core languages. Functionality includes the ability to toggle range indexes, geospatial indexes, the RDF triple index, and reverse indexes on or off based on your data, the kinds of queries that you will run, and your desired performance. Full-text search MarkLogic supports search across its data and metadata using a word or phrase and incorporates Boolean logic, stemming, wildcards, case sensitivity, punctuation sensitivity, diacritic sensitivity, and search term weighting. Data can be searched using JavaScript, XQuery, SPARQL, and SQL. Semantics MarkLogic uses RDF triples to provide semantics for ease of storing metadata and querying. ACID Unlike other NoSQL databases, MarkLogic maintains ACID consistency for transactions. Replication MarkLogic provides high availability with replica sets. Scalability MarkLogic scales horizontally using sharding. MarkLogic can run over multiple servers, balancing the load or replicating data to keep the system up and running in the event of hardware failure. Security MarkLogic has built in security features such as element-level permissions and data redaction. Optic API for Relational Operations An API that lets developers view their data as documents, graphs or rows. Security MarkLogic provides redaction, encryption, and element-level security (allowing for control on read and write rights on parts of a document). == Applications == Banking Big Data Fraud prevention Insurance Claims Management and Underwriting Master data management Recommendation engines == Licensing == MarkLogic is available under various licensing and delivery models, namely a free Developer or an Essential Enterprise license.[3] Licenses are available from MarkLogic or directly from cloud marketplaces such as Amazon Web Services and Microsoft Azure. == Releases == 2001 – Cerisent XQE 1: ACID transactions, Full-text search, XML Storage, XQuery, Role-based security 2004 – Cerisent XQE 2: Scale-out architecture, Enhanced search (stemming, thesaurus, wildcard), Backup and restore 2005 – MarkLogic Server 3: Continuing search improvements, Content Processing Framework (including PDF, Word, Excel, PPT), Failover 2008 – MarkLogic Server 4: Geospatial search, entity extraction, advanced XQuery, performance, scalability enhancements, auditing 2011 – MarkLogic Server 5: Flexible replication / DDIL, real-time indexing, advanced search, improved analytics, concurrency enhancements 2012 – MarkLogic Server 6: REST and Java APIs, App Builder, enhanced UI, improved search 2013 – MarkLogic Server 7: Semantic graph, bitemporal data, tiered storage, improved search, better management 2015 – MarkLogic Server 8: A Native JSON storage, Server-side JavaScript, Bitemporal, Node.js client API, Incremental backup, Flexible replication[16] 2017 – MarkLogic Server 9: Data integration across Relational and Non-Relational data, Advanced Encryption, Element Level Security, Redaction 2019 – MarkLogic Server 10: Enhanced Data Hub, improved SQL, security, analytics performance, cloud support 2022 – MarkLogic Server 11: MarkLogic Ops Director (Monitoring and Administration Improvements), expanded PKI 2025 – MarkLogic Server 12: Generative AI and Native Vector Search, Graph Algorithm Support, Virtual TDEs (relational views on the fly)

    Read more →
  • Spatial embedding

    Spatial embedding

    Spatial embedding is one of feature learning techniques used in spatial analysis where points, lines, polygons or other spatial data types. representing geographic locations are mapped to vectors of real numbers. Conceptually it involves a mathematical embedding from a space with many dimensions per geographic object to a continuous vector space with a much lower dimension. Such embedding methods allow complex spatial data to be used in neural networks and have been shown to improve performance in spatial analysis tasks == Embedded data types == Geographic data can take many forms: text, images, graphs, trajectories, polygons. Depending on the task, there may be a need to combine multimodal data from different sources. The next section describes examples of different types of data and their uses. === Text === Geolocated posts on social media can be used to acquire a library of documents bound to a given place that can be later transformed to embedded vectors using word embedding techniques. === Image === Satellites and aircraft collect digital spatial data acquired from remotely sensed images which can be used in machine learning. They are sometimes hard to analyse using basic image analysis methods and convolutional neural networks can be used to acquire an embedding of images bound to a given geographical object or a region. === Point === A single point of interest (POI) can be assigned multiple features that can be used in machine learning. These could be demographic, transportation, meteorological, or economic data, for example. When embedding single points, it is common to consider the entire set of available points as nodes in a graph. === Line / multiline === Among other things, motion trajectories are represented as lines (multilines). Individual trajectories are embedded taking into account travel time, distances and also features of points visited along the way. Embedding of trajectories allows to improve performance of such tasks as clustering and also categorization. === Polygon === The geographic areas analyzed in machine learning are defined by both administrative boundaries and top-down division into grids of regular shapes such as rectangles, for example. Both types are represented as polygons and, like points, can be assigned different demographic, transportation, or economic features. A polygon can also have features related to the size of the area or shape it represents. === Graph === An example domain where graph representation is used is the street layout in a city, where vertices can be intersections and edges can be roads. The vertices can also be destination points like public transport stops or important points in the city, and the edges represent the flow between them. Embedding graphs or single vertices allows to improve accuracy of analysis methods in which the treated geographical domain can be represented as a network. == Usage == POI recommendation - generating personalized point of interest recommendations based on user preferences. Next/future location prediction - prediction of the next location a person will go to based on their historical trajectory. Zone functions classification - based on different mobility of people or POI distribution a function of a given area in a city can be predicted. Crime prediction - estimation of crime rate in different regions of a city. Local event detection - studying spatio-temporal changes in embeddings can provide valuable information in detection of local event occurring in specific location. Regional mobility popularity prediction - analysis of mobility can show patterns in popularity of different regions in a city. Shape matching - finding a similar shape of given polygon, for example finding building with the same shape as input building. Travel time estimation - predicting estimated travel time given current traffic conditions and special occurring events. Time estimation for on-demand food delivery - estimation of delivery time when placing an order through the website. == Temporal aspect == Some of the data analyzed has a timestamp associated with it. In some cases of data analysis this information is omitted and in others it is used to divide the set into groups. The most common division is the separation of weekdays from weekends or division into hours of the day. This is particularly important in the analysis of mobility data, because the characteristics of mobility during the week and at different times of the day are very different from each other. Another area in which time division into, for example, individual months can be used is in the analysis of tourism of a given region. In order to take such a split into account, embedding methods treat the time stamp specifically or separate versions of the model are developed for different subgroups of the analyzed set.

    Read more →
  • Discoverability

    Discoverability

    Discoverability is the degree to which something, especially a piece of content or information, can be found in a search of a file, database, or other information system. Discoverability is a concern in library and information science, many aspects of digital media, software and web development, and in marketing, since products and services cannot be used if people cannot find it or do not understand what it can be used for. In human-computer interaction the term is further used to describe the discoverability of interactions, features and interactive systems overall . Metadata, or "information about information", such as a book's title, a product's description, or a website's keywords, affects how discoverable something is on a database or online. Adding metadata to a product that is available online can make it easier for end users to find the product. For example, if a song file is made available online, making the title, band name, genre, year of release, and other pertinent information available in connection with this song means the file can be retrieved more easily. The organization of information through the implementation of alphabetical structures or the integration of content into search engines exemplifies strategies employed to enhance the discoverability of information. The concept of discoverability, while related to but distinct from accessibility and usability, which are other qualities that affect the usefulness of a piece of information, is a critical aspect of information retrieval. == Etymology == The concept of "discoverability" in an information science and online context is a loose borrowing from the concept of the similar name in the legal profession. In law, "discovery" is a pre-trial procedure in a lawsuit in which each party, through the law of civil procedure, can obtain evidence from the other party or parties by means of discovery devices such as a request for answers to interrogatories, request for production of documents, request for admissions and depositions. Discovery can be obtained from non-parties using subpoenas. When a discovery request is objected to, the requesting party may seek the assistance of the court by filing a motion to compel discovery. == Purpose == The usability of any piece of information directly relates to how discoverable it is, either in a "walled garden" database or on the open Internet. The quality of information available on this database or on the Internet depends upon the quality of the meta-information about each item, product, or service. In the case of a service, because of the emphasis placed on service reusability, opportunities should exist for reuse of this service. However, reuse is only possible if information is discoverable in the first place. To make items, products, and services discoverable, the process is as follows: Document the information about the item, product or service (the metadata) in a consistent manner. Store the documented information (metadata) in a searchable repository. while technically a human-searchable repository, such as a printed paper list would qualify, "searchable repository" is usually taken to mean a computer-searchable repository, such as a database that a human user can search using some type of search engine or "find" feature. Enable search for the documented information in an efficient manner. supports number 2, because while reading through a printed paper list by hand might be feasible in a theoretical sense, it is not time and cost-efficient in comparison with computer-based searching. Apart from increasing the reuse potential of the services, discoverability is also required to avoid development of solution logic that is already contained in an existing service. To design services that are not only discoverable but also provide interpretable information about their capabilities, the service discoverability principle provides guidelines that could be applied during the service-oriented analysis phase of the service delivery process. === Specific to digital media === In relation to audiovisual content, according to the meaning given by the Canadian Radio-television and Telecommunications Commission (CRTC) for the purpose of its 2016 Discoverability Summit, discoverability can be summed up to the intrinsic ability of given content to "stand out of the lot", or to position itself so as to be easily found and discovered. A piece of audiovisual content can be a movie, a TV series, music, a book (eBook), an audio book or podcast. When audiovisual content such as a digital file for a TV show, movie, or song, is made available online, if the content is "tagged" with identifying information such as the names of the key artists (e.g., actors, directors and screenwriters for TV shows and movies; singers, musicians and record producers for songs) and the genres (for movies genres, music genres, etc.). When users interact with online content, algorithms typically determine what types of content the user is interested in, and then a computer program suggests "more like this", which is other content that the user may be interested in. Different websites and systems have different algorithms, but one approach, used by Amazon (company) for its online store, is to indicate to a user: "customers who bought x also bought y" (affinity analysis, collaborative filtering). This example is oriented around online purchasing behaviour, but an algorithm could also be programmed to provide suggestions based on other factors (e.g., searching, viewing, etc.). Discoverability is typically referred to in connection with search engines. A highly "discoverable" piece of content would appear at the top, or near the top of a user's search results. A related concept is the role of "recommendation engines", which give a user recommendations based on his/her previous online activity. Discoverability applies to computers and devices that can access the Internet, including various console video game systems and mobile devices such as tablets and smartphones. When producers make an effort to promote content (e.g., a TV show, film, song, or video game), they can use traditional marketing (billboards, TV ads, radio ads) and digital ads (pop-up ads, pre-roll ads, etc.), or a mix of traditional and digital marketing. Even before the user's intervention by searching for a certain content or type of content, discoverability is the prime factor which contributes to whether a piece of audiovisual content will be likely to be found in the various digital modes of content consumption. As of 2017, modes of searching include looking on Netflix for movies, Spotify for music, Audible for audio books, etc., although the concept can also more generally be applied to content found on Twitter, Tumblr, Instagram, and other websites. It involves more than a content's mere presence on a given platform; it can involve associating this content with "keywords" (tags), search algorithms, positioning within different categories, metadata, etc. Thus, discoverability enables as much as it promotes. For audiovisual content broadcast or streamed on digital media using the Internet, discoverability includes the underlying concepts of information science and programming architecture, which are at the very foundation of the search for a specific product, information or content. === Human-Computer Interaction === In human–computer interaction (HCI), discoverability refers to the ability of users to perceive and comprehend a system, function, or input method upon encountering it, despite a lack of prior awareness or knowledge, whether through intentional effort or serendipitously . The concept was popularised by Don Norman, who framed it around whether users can determine what actions are possible and how to perform them . Discoverability is considered a precondition for learnability, though the two concepts are frequently conflated in the literature . == Applications == === Within a webpage === Within a specific webpage or software application ("app"), the discoverability of a feature, content or link depends on a range of factors, including the size, colour, highlighting features, and position within the page. When colour is used to communicate the importance of a feature or link, designers typically use other elements as well, such as shadows or bolding, for individuals, who cannot see certain colours. Just as traditional paper printing created other physical locations that stood out, such as being "above the fold" of a newspaper versus "below the fold", a web page or app's screenview may have certain locations that give features additional visibility to users, such as being right at the bottom of the web page or screen. The positional advantages or disadvantages of various locations depend on different cultures and languages (e.g., left to right vs. right to left). Some locations have become established, such as having toolbars at the top of a screen or webpage. Some designers have argued t

    Read more →
  • Artificial Intelligence Applications Institute

    Artificial Intelligence Applications Institute

    The Artificial Intelligence Applications Institute (AIAI) at the School of Informatics at the University of Edinburgh is a non-profit technology transfer organisation that promoted research in the field of artificial intelligence. == History == The Artificial Intelligence Applications Institute (AIAI) was founded in 1983 at the University of Edinburgh as a specialist research and technology-transfer unit focusing on the practical uses of artificial intelligence (AI). The institute was established by Professor Jim Howe and colleagues from the Science and Engineering Research Council (SERC) Special Interest Group in AI in the Department of Artificial Intelligence, with a mission to apply AI techniques to solve real-world industrial and governmental problems. Under the directorship of Austin Tate, who served from 1985 to 2019, AIAI became one of the leading UK research centres devoted to AI programming systems, intelligent planning systems, decision support, and knowledge-based engineering. It collaborated with both academic partners and international organisations such as the European Space Agency and the UK Ministry of Defence. In 2001, AIAI joined the newly created Centre for Intelligent Systems and their Applications (CISA) within the University's School of Informatics. In December 2019, the institute was renamed the Artificial Intelligence and its Applications Institute to reflect a broader integration of fundamental and applied AI research. == Research programmes == AIAI’s research spans multiple areas of artificial intelligence, including: AI programming Systems - Edinburgh Prolog, Edinburgh Common Lisp, Logo; Knowledge representation and reasoning – development of ontologies, rule-based inference, and semantic modelling; Automated planning and scheduling – intelligent task management systems used in aerospace, manufacturing, and emergency response; Natural language processing and intelligent agents – interaction frameworks for human–computer collaboration; AI ethics and decision-making – research into responsible deployment and evaluation of autonomous systems. The institute also contributes to interdisciplinary fields such as computational creativity, explainable AI, and human–AI interaction. AIAI maintains close collaboration with the Bayes Centre and the Alan Turing Institute through joint research programmes and doctoral training initiatives. == Technology transfer and impact == From its inception, AIAI has combined academic research with technology-transfer activity, offering professional training, industrial consultancy, and bespoke software systems. It pioneered one of the earliest knowledge-based project-management systems, O-Plan, later evolved into the I-Plan framework used for autonomous planning and workflow management.

    Read more →