AI Email Enhancer

AI Email Enhancer — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Be My Eyes

    Be My Eyes

    Be My Eyes is a Danish mobile app that aims to help blind and visually impaired people to recognize objects and manage everyday situations. An online community of sighted volunteers receive photos or videos from randomly assigned affected individuals and assist via live chat. In 2023, the company launched Be My AI, an AI-based interface to help blind and visually impaired users describe images. The app is currently available for Android, iOS, and Windows. == History == === Founding and early years === The app was developed and marketed by Hans Jørgen Wiberg. He had demonstrated that although there are video chat software such as Skype and FaceTime, none is tailored for the visually impaired. For development, he joined forces with the Danish Association of the Blind, and other organizations. The app was first presented at an event for start-up companies in 2012 and first released in 2015. A version for Android was released in 2017, in addition to the iOS version. Praise was given for easy use of the app. The lack of sufficient data protection, which makes it possible to pass on data to third parties, was criticized. === Recent developments === The company has raised over $650,000, including funding from Silicon Valley, Microsoft, and other angel investors. In February 2020, $2.8 million in Series A funding was raised, allowing the company to further develop its business model while keeping visual support services free for visually impaired users. The investment allows the company to further develop its unique "purpose and profit" business model while keeping the visual support service free and unlimited for all visually impaired users. === User base and accessibility === Over 9.3 million volunteers and 900,000 blind or visually impaired people use the app. == Features == === Human-based assistance === A visually impaired person starts a live stream showing their view from their cellphone camera. They are assigned, through a phone call or chat, a random volunteer who speaks the same language and who is in the same time zone. This allows the volunteer to describe an object and assist the visually impaired person, such as guiding the person to move their camera, read instructions, or clean up a spill. Through speech synthesis, content can be read out loud. This process encourages a more independent life for blind and visually impaired people. === Be My AI === In March of 2023, Be My Eyes launched Be My AI, an AI-based virtual assistant. Be My AI is accessible through the Be My Eyes app, and is based on OpenAI's GPT-4 large language model. Through the interface, the app allows blind and visually impaired users to send images from a variety of devices to be described. The app allows users to then follow up with questions to further tailor the image description. Blind users report using Be My AI for a variety of tasks, including reading menus, identifying clothing, and describing people. The Be My AI interface is available on Android, iOS, and Windows. Within a few weeks of the interface's roll out, the company reported that it had been used one million times, and it was named among Time's best inventions of 2023. Be My AI is part of a growing number of AI-based apps and devices designed to help blind and visually impaired individuals. == Partnerships == === Microsoft === In November 2023, Be My Eyes entered a partnership with Microsoft to share data to help improve accessibility-focused AI models. === Meta === In 2024, Be My Eyes integrated with Ray-Ban Meta smart glasses, a wearable product developed by Meta and EssilorLuxottica. The partnership enabled users to receive hands-free, real-time visual descriptions and volunteer assistance by using voice commands through the smart glasses. === Hilton === In October 2024, Hilton partnered with Be My Eyes to provide live video assistance for blind and low-vision guests. The free service connects travelers to a Hilton team member that can guide them through tasks like adjusting thermostats, opening window shades, or navigating hotel amenities. This collaboration progressed from a prior arrangement where Hilton helped train Be My Eyes' GPT-4 powered AI model to better recognize objects and layouts in hotel rooms. === Tesco === In October 2025, retailer Tesco announced its partnership with Be My Eyes to launch a six-month pilot aimed at improving in-store accessibility in the UK. The initiative was launched on World Sight Day, 9 October, enabling Be My Eyes users to connect directly with Tesco staff via the app for personalised visual assistance while shopping, Euronewsweek reported. == Awards == Nordic Startup Awards for "Best Social Entrepreneurial Tech Startup" in Denmark 2021 Apple Design Award for best social impact

    Read more →
  • Landmark point

    Landmark point

    In morphometrics, landmark point or shortly landmark is a point in a shape object in which correspondences between and within the populations of the object are preserved. In other disciplines, landmarks may be known as vertices, anchor points, control points, sites, profile points, 'sampling' points, nodes, markers, fiducial markers, etc. Landmarks can be defined either manually by experts or automatically by a computer program. There are three basic types of landmarks: anatomical landmarks, mathematical landmarks or pseudo-landmarks. An anatomical landmark is a biologically-meaningful point in an organism. Usually experts define anatomical points to ensure their correspondences within the same species. Examples of anatomical landmark in shape of a skull are the eye corner, tip of the nose, jaw, etc. Anatomical landmarks determine homologous parts of an organism, which share a common ancestry. Mathematical landmarks are points in a shape that are located according to some mathematical or geometrical property, for instance, a high curvature point or an extreme point. A computer program usually determines mathematical landmarks used for an automatic pattern recognition. Pseudo-landmarks are constructed points located between anatomical or mathematical landmarks. A typical example is an equally spaced set of points between two anatomical landmarks to get more sample points from a shape. Pseudo-landmarks are useful during shape matching, when the matching process requires a large number of points.

    Read more →
  • Spleak

    Spleak

    Spleak was an IM platform where users could publish and rate content. It existed in the form of six bots covering as many subject areas: CelebSpleak, SportSpleak, VoteSpleak, TVSpleak, GameSpleak, and StyleSpleak. == Overview == Users can add a "multi-Spleak" (which contains all of the different Spleak bots in one) or add the separate bots to their IM buddy lists on MSN and AIM. Users are also allowed access to Spleak online by using a CelebSpleak, SportSpleak, or VoteSpleak widget, or through the CelebSpleak and SportSpleak applications with Facebook. Spleak was an alternate reality game and is moving to its own company, Spleak Media Network. "Celebrate Spleak" was introduced throughout 2007, launched in 2008, and was forced to retire in 2009. == Key people == Spleak was co-founded by Morten Lund and Nicolaj Reffstrup. The company's chief executive officer is Morrie Eisenburg; Josh Scott is Vice President in Product and Tyler Wells is Vice President in Engineering.

    Read more →
  • Just This Once

    Just This Once

    Just This Once is a 1993 romance novel written in the style of Jacqueline Susann by a Macintosh IIcx computer named "Hal" in collaboration with its programmer, Scott French. French reportedly spent $40,000 and 8 years developing an artificial intelligence program to analyze Susann's works and attempt to create a novel that Susann might have written. A legal dispute between the estate of Jacqueline Susann and the publisher resulted in a settlement to split the profits, and the book was referenced in several legal journal articles about copyright laws. The book had two small print runs totaling 35,000 copies, receiving mixed reviews. == Creation == The novel's creation spanned the fields of artificial intelligence, expert systems, and natural language processing. Scott French first scanned and analyzed portions of two books by Jacqueline Susann, Valley of the Dolls and Once Is Not Enough, to determine constituents of Susann's writing style, which French stated was the most difficult task. This analysis extracted several hundred components including frequency and type of sexual acts and sentence structure. "Once you're there, the writer's style emerges, part of her actual personality comes out, and the computer can be programmed to make a story." French also created several thousand rules to govern tone, plotting, scenes, and characters. The text generated by Hal, the computer, was intended to mimic what Susann might have written, although the output required significant editing. French credits Hal's work with "almost 100% of the plot, 100% of the theme and style." French estimates that he wrote 10% of the prose, the computer Hal wrote about 25% of the prose, and the remaining two-thirds was more of a collaboration between the two. A typical scenario to write a scene would involve Hal asking questions that French would answer (for example, Hal might ask about the "cattiness factor" involved in a meeting between two key female characters, and French would reply with a range of 1 to 10), and the computer would then generate a few sentences to which French would make minor edits. The process would repeat for the next few sentences until the scene was written. == Legal issues == Jacqueline Susann's publisher was skeptical of the legality of Just This Once, although French doubted that an author's thought processes could be copyrighted. Susann's estate reportedly threatened to sue Scott French but the parties settled out of court; the settlement involved splitting profits between the parties but the terms of the settlement were not disclosed. The publication of Just This Once raised questions in the legal profession concerning how copyright law applies to computer-generated works derived from an analysis of other copyrighted works, and whether the generation of such works infringes on copyright. The publications on this topic suggested that the copyright laws of the time were ill-equipped to deal with computer-generated creative works. == Reception == The book's publisher Steven Shragis of Carol Group said of the novel, "I'm not going to say this is a great literary work, but it's every bit as good as anything out in this field, and better than an awful lot." The novel received some positive early reviews. In USA Today, novelist Thomas Gifford compared Just This Once to another novel in the same genre, American Star by Jackie Collins. Gifford concluded: "If you do like this stuff, you'd be much, much better off with the one written by the computer." The Dead Jackie Susann Quarterly declared that Susann "would be proud. Lots of money, sleaze, disease, death, oral sex, tragedy and the good girl gone bad." Other reviews were mixed. Publishers Weekly wrote, "If the books of Jacqueline Susann and Harold Robbins seem formulaic, this debut novel of sin and success in Las Vegas outdoes them all. And that, in a way, is the point.... All novelty rests in the conceit of computer authorship, not in the story itself." Library Journal stated "French invested eight years and $50,000 in a scheme to use artificial intelligence to fulfill his authentic, if dubious, desire to generate a trashy novel a la Jacqueline Susann. Shallow, beautiful-people characters are flatly conceived and randomly accessed in a formulaic plot ... a sexy, boring morality tale. Of possible interest to computer buffs for its use of Expert Systems and the virtual promise of more worthy possibilities; others should read Susann." Kirkus Reviews wrote: "The deal here is that author French is not the author, he's just the midwife, having allegedly programmed his computer to write about our times just the way Susann would... almost perfectly capturing glamorous Jackie's turgid but E-Z reading prose style and ultrareliable mix of sex, glitz, dope 'n' despair.... One wonders, though, if French's tale spinning PC will do as well on the talkshows as Jackie did. The computer weenies have been trying to tell us for years, garbage in-garbage out."

    Read more →
  • Computer security

    Computer security

    Computer security (also cybersecurity, digital security, or information technology (IT) security) is a subdiscipline within the field of information security. It focuses on protecting computer software, systems, and networks from threats that can lead to unauthorized information disclosure, theft, or damage to hardware, software, or data, as well as to the disruption or misdirection of the services they provide. The growing significance of computer security reflects the increasing dependence on computer systems, the Internet, and evolving wireless network standards. This reliance has expanded with the proliferation of smart devices, including smartphones, televisions, and other components of the Internet of things (IoT). As digital infrastructure becomes more embedded in everyday life, cybersecurity has emerged as a critical concern. The complexity of modern information systems—and the societal functions they underpin—has introduced new vulnerabilities. Systems that manage essential services, such as power grids, electoral processes, and finance, are particularly sensitive to security breaches. Although many aspects of computer security involve digital security, such as electronic passwords and encryption, physical security measures, such as metal locks, are still used to prevent unauthorized tampering. IT security is not a perfect subset of information security and therefore does not completely align with the security convergence schema. == Vulnerabilities and attacks == A vulnerability refers to a flaw in the structure, execution, functioning, or internal oversight of a computer or system that compromises its security. Most of the vulnerabilities that have been discovered are documented in the Common Vulnerabilities and Exposures (CVE) database. An exploitable vulnerability is one for which at least one working exploit exists. Actors maliciously seeking vulnerabilities are known as threats. Vulnerabilities can be researched, reverse-engineered, hunted, or exploited using automated tools or customized scripts. Various people or parties are vulnerable to cyberattacks; however, different groups are likely to experience different types of attacks more than others. In April 2023, the United Kingdom Department for Science, Innovation & Technology released a report on cyberattacks over the previous 12 months. They surveyed 2,263 UK businesses, 1,174 UK registered charities, and 554 education institutions. The research found that "32% of businesses and 24% of charities overall recall any breaches or attacks from the last 12 months." These figures were much higher for "medium businesses (59%), large businesses (69%), and high-income charities with £500,000 or more in annual income (56%)." Yet, although medium or large businesses are more often the victims, since larger companies have generally improved their security over the last decade, small and midsize businesses (SMBs) have also become increasingly vulnerable as they often "do not have advanced tools to defend the business." SMBs are most likely to be affected by malware, ransomware, phishing, man-in-the-middle attacks, and Denial-of Service (DoS) Attacks. Normal internet users are most likely to be affected by untargeted cyberattacks. These are where attackers indiscriminately target as many devices, services, or users as possible. They do this using techniques that take advantage of the openness of the Internet. These strategies mostly include phishing, ransomware, water holing and scanning. To secure a computer system, it is important to understand the attacks that can be made against it, and these threats can typically be classified into one of the following categories: === Backdoor === A backdoor in a computer system, a cryptosystem or an algorithm, is any secret method of bypassing normal authentication or security controls. These weaknesses may exist for many reasons, including original design or poor configuration. Due to the nature of backdoors, they are of greater concern to companies and databases as opposed to individuals. Backdoors may be added by an authorized party to allow some legitimate access or by an attacker for malicious reasons. Criminals often use malware to install backdoors, giving them remote administrative access to a system. Once they have access, cybercriminals can "modify files, steal personal information, install unwanted software, and even take control of the entire computer." Backdoors can be difficult to detect, as they often remain hidden within source code or system firmware and may require intimate knowledge of the operating system to identify. === Denial-of-service attack === Denial-of-service attacks (DoS) are designed to make a machine or network resource unavailable to its intended users. Attackers can deny service to individual victims, such as by deliberately entering an incorrect password enough consecutive times to cause the victim's account to be locked, or they may overload the capabilities of a machine or network and block all users at once. While a network attack from a single IP address can be blocked by adding a new firewall rule, many forms of distributed denial-of-service (DDoS) attacks are possible, where the attack comes from a large number of points. In this case, defending against these attacks is much more difficult. Such attacks can originate from the zombie computers of a botnet or from a range of other possible techniques, including distributed reflective denial-of-service (DRDoS), where innocent systems are fooled into sending traffic to the victim. With such attacks, the amplification factor makes the attack easier for the attacker because they have to use little bandwidth themselves. To understand why attackers may carry out these attacks, see the 'attacker motivation' section. === Physical access attacks === A direct-access attack is when an unauthorized user (an attacker) gains physical access to a computer, typically to copy data from it or steal information. Attackers may also compromise security by making operating system modifications, installing software worms, keyloggers, covert listening devices or using wireless microphones. Even when the system is protected by standard security measures, these may be bypassed by booting another operating system or tool from a CD-ROM or other bootable media. Disk encryption and the Trusted Platform Module standard are designed to prevent these attacks. Direct service attackers are related in concept to direct memory attacks which allow an attacker to gain direct access to a computer's memory. The attacks "take advantage of a feature of modern computers that allows certain devices, such as external hard drives, graphics cards, or network cards, to access the computer's memory directly." === Eavesdropping === Eavesdropping is the act of surreptitiously listening to a private computer conversation (communication), usually between hosts on a network. It typically occurs when a user connects to a network where traffic is not secured or encrypted and sends sensitive business data to a colleague, which, when listened to by an attacker, could be exploited. Data transmitted across an open network can be intercepted by an attacker using various methods. Unlike malware, direct-access attacks, or other forms of cyberattacks, eavesdropping attacks are unlikely to negatively affect the performance of networks or devices, making them difficult to notice. In fact, "the attacker does not need to have any ongoing connection to the software at all. The attacker can insert the software onto a compromised device, perhaps by direct insertion or perhaps by a virus or other malware, and then come back some time later to retrieve any data that is found or trigger the software to send the data at some determined time." Using a virtual private network (VPN), which encrypts data between two points, is one of the most common forms of protection against eavesdropping. Using the best form of encryption possible for wireless networks is best practice, as well as using HTTPS instead of an unencrypted HTTP. Programs such as Carnivore and NarusInSight have been used by the Federal Bureau of Investigation (FBI) and the NSA to eavesdrop on the systems of internet service providers. Even machines that operate as a closed system (i.e., with no contact with the outside world) can be eavesdropped upon by monitoring the faint electromagnetic transmissions generated by the hardware. TEMPEST is a specification by the NSA referring to these attacks. === Malware === Malicious software (malware) is any software code or computer program "intentionally written to harm a computer system or its users." Once present on a computer, it can leak sensitive details such as personal information, business information and passwords, can give control of the system to the attacker, and can corrupt or delete data permanently. ==== Types of malware ==== Viruses are a specific type of malware, and are normally a malicious code that hijac

    Read more →
  • Legal information retrieval

    Legal information retrieval

    Legal information retrieval is the science of information retrieval applied to legal text, including legislation, case law, and scholarly works. Accurate legal information retrieval is important to provide access to the law to laymen and legal professionals. Its importance has increased because of the vast and quickly increasing amount of legal documents available through electronic means. Legal information retrieval is a part of the growing field of legal informatics. In a legal setting, it is frequently important to retrieve all information related to a specific query. However, commonly used boolean search methods (exact matches of specified terms) on full text legal documents have been shown to have an average recall rate as low as 20 percent, meaning that only 1 in 5 relevant documents are actually retrieved. In that case, researchers believed that they had retrieved over 75% of relevant documents. This may result in failing to retrieve important or precedential cases. In some jurisdictions this may be especially problematic, as legal professionals are ethically obligated to be reasonably informed as to relevant legal documents. Legal Information Retrieval attempts to increase the effectiveness of legal searches by increasing the number of relevant documents (providing a high recall rate) and reducing the number of irrelevant documents (a high precision rate). This is a difficult task, as the legal field is prone to jargon, polysemes (words that have different meanings when used in a legal context), and constant change. Techniques used to achieve these goals generally fall into three categories: boolean retrieval, manual classification of legal text, and natural language processing of legal text. == Problems == Application of standard information retrieval techniques to legal text can be more difficult than application in other subjects. One key problem is that the law rarely has an inherent taxonomy. Instead, the law is generally filled with open-ended terms, which may change over time. This can be especially true in common law countries, where each decided case can subtly change the meaning of a certain word or phrase. Legal information systems must also be programmed to deal with law-specific words and phrases. Though this is less problematic in the context of words which exist solely in law, legal texts also frequently use polysemes, words may have different meanings when used in a legal or common-speech manner, potentially both within the same document. The legal meanings may be dependent on the area of law in which it is applied. For example, in the context of European Union legislation, the term "worker" has four different meanings: Any worker as defined in Article 3(a) of Directive 89/391/EEC who habitually uses display screen equipment as a significant part of his normal work. Any person employed by an employer, including trainees and apprentices but excluding domestic servants; Any person carrying out an occupation on board a vessel, including trainees and apprentices, but excluding port pilots and shore personnel carrying out work on board a vessel at the quayside; Any person who, in the Member State concerned, is protected as an employee under national employment law and in accordance with national practice; It also has the common meaning: A person who works at a specific occupation. Though the terms may be similar, correct information retrieval must differentiate between the intended use and irrelevant uses in order to return the correct results. Even if a system overcomes the language problems inherent in law, it must still determine the relevancy of each result. In the context of judicial decisions, this requires determining the precedential value of the case. Case decisions from senior or superior courts may be more relevant than those from lower courts, even where the lower court's decision contains more discussion of the relevant facts. The opposite may be true, however, if the senior court has only a minor discussion of the topic (for example, if it is a secondary consideration in the case). An information retrieval system must also be aware of the authority of the jurisdiction. A case from a binding authority is most likely of more value than one from a non-binding authority. Additionally, the intentions of the user may determine which cases they find valuable. For instance, where a legal professional is attempting to argue a specific interpretation of law, he might find a minor court's decision which supports his position more valuable than a senior courts position which does not. He may also value similar positions from different areas of law, different jurisdictions, or dissenting opinions. Overcoming these problems can be made more difficult because of the large number of cases available. The number of legal cases available via electronic means is constantly increasing (in 2003, US appellate courts handed down approximately 500 new cases per day), meaning that an accurate legal information retrieval system must incorporate methods of both sorting past data and managing new data. == Techniques == === Boolean searches === Boolean searches, where a user may specify terms such as use of specific words or judgments by a specific court, are the most common type of search available via legal information retrieval systems. They are widely implemented but overcome few of the problems discussed above. The recall and precision rates of these searches vary depending on the implementation and searches analyzed. One study found a basic boolean search's recall rate to be roughly 20%, and its precision rate to be roughly 79%. Another study implemented a generic search (that is, not designed for legal uses) and found a recall rate of 56% and a precision rate of 72% among legal professionals. Both numbers increased when searches were run by non-legal professionals, to a 68% recall rate and 77% precision rate. This is likely explained because of the use of complex legal terms by the legal professionals. === Manual classification === In order to overcome the limits of basic boolean searches, information systems have attempted to classify case laws and statutes into more computer friendly structures. Usually, this results in the creation of an ontology to classify the texts, based on the way a legal professional might think about them. These attempt to link texts on the basis of their type, their value, and/or their topic areas. Most major legal search providers now implement some sort of classification search, such as Westlaw's “Natural Language” or LexisNexis' Headnote searches. Additionally, both of these services allow browsing of their classifications, via Westlaw's West Key Numbers or Lexis' Headnotes. Though these two search algorithms are proprietary and secret, it is known that they employ manual classification of text (though this may be computer-assisted). These systems can help overcome the majority of problems inherent in legal information retrieval systems, in that manual classification has the greatest chances of identifying landmark cases and understanding the issues that arise in the text. In one study, ontological searching resulted in a precision rate of 82% and a recall rate of 97% among legal professionals. The legal texts included, however, were carefully controlled to just a few areas of law in a specific jurisdiction. The major drawback to this approach is the requirement of using highly skilled legal professionals and large amounts of time to classify texts. As the amount of text available continues to increase, some have stated their belief that manual classification is unsustainable. === Natural language processing === In order to reduce the reliance on legal professionals and the amount of time needed, efforts have been made to create a system to automatically classify legal text and queries. Adequate translation of both would allow accurate information retrieval without the high cost of human classification. These automatic systems generally employ Natural Language Processing (NLP) techniques that are adapted to the legal domain, and also require the creation of a legal ontology. Though multiple systems have been postulated, few have reported results. One system, “SMILE,” which attempted to automatically extract classifications from case texts, resulted in an f-measure (which is a calculation of both recall rate and precision) of under 0.3 (compared to perfect f-measure of 1.0). This is probably much lower than an acceptable rate for general usage. Despite the limited results, many theorists predict that the evolution of such systems will eventually replace manual classification systems. === Citation-Based ranking === In the mid-90s the Room 5 case law retrieval project used citation mining for summaries and ranked its search results based on citation type and count. This slightly pre-dated the PageRank algorithm at Stanford which was also a citation-based ranking. Ranking of results was based

    Read more →
  • Retrieval-augmented generation

    Retrieval-augmented generation

    Retrieval-augmented generation (RAG) is a technique that enables large language models (LLMs) to retrieve and incorporate new information from external data sources. With RAG, LLMs first refer to a specified set of documents, then respond to user queries. These documents supplement information from the LLM's pre-existing training data. This allows LLMs to use domain-specific and/or updated information that is not available in the training data. For example, this enables LLM-based chatbots to access internal company data or generate responses based on authoritative sources. RAG improves LLMs by incorporating information retrieval before generating responses. Unlike LLMs that rely on static training data, RAG pulls relevant text from databases, uploaded documents, or web sources. According to Ars Technica, "RAG is a way of improving LLM performance, in essence by blending the LLM process with a web search or other document look-up process to help LLMs stick to the facts." This method helps reduce AI hallucinations, which have caused chatbots to describe policies that don't exist, or recommend nonexistent legal cases to lawyers that are looking for citations to support their arguments. RAG also reduces the need to retrain LLMs with new data, saving on computational and financial costs. Beyond efficiency gains, RAG also allows LLMs to include sources in their responses, so users can verify the cited sources. This provides greater transparency, as users can cross-check retrieved content to ensure accuracy and relevance. The term retrieval-augmented generation (RAG) was introduced in a 2020 paper that described combining a parametric language model with a non-parametric external memory accessed through retrieval at inference time. == RAG and LLM limitations == LLMs can provide incorrect information. For example, when Google first demonstrated its LLM tool "Google Bard" (later re-branded to Gemini), the LLM provided incorrect information about the James Webb Space Telescope. This error contributed to a $100 billion decline in Google's stock value. RAG is used to prevent these errors, but it does not solve all the problems. For example, LLMs can generate misinformation even when pulling from factually correct sources if they misinterpret the context. MIT Technology Review gives the example of an AI-generated response stating, "The United States has had one Muslim president, Barack Hussein Obama." The model retrieved this from an academic book rhetorically titled Barack Hussein Obama: America's First Muslim President? The LLM did not "know" or "understand" the context of the title, generating a false statement. LLMs with RAG are programmed to prioritize new information. This technique has been called "prompt stuffing." Without prompt stuffing, the LLM's input is generated by a user; with prompt stuffing, additional relevant context is added to this input to guide the model's response. This approach provides the LLM with key information early in the prompt, encouraging it to prioritize the supplied data over pre-existing training knowledge. == Process == Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating an information-retrieval mechanism that allows models to access and utilize additional data beyond their original training set. Ars Technica notes that "when new information becomes available, rather than having to retrain the model, all that's needed is to augment the model's external knowledge base with the updated information" ("augmentation"). IBM states that "in the generative phase, the LLM draws from the augmented prompt and its internal representation of its training data to synthesize" an answer. === RAG key stages === Typically, the data to be referenced is converted into LLM embeddings, numerical representations in the form of a large vector space. RAG can be used on unstructured (usually text), semi-structured, or structured data (for example knowledge graphs). These embeddings are then stored in a vector database to allow for document retrieval. Given a user query, a document retriever is first called to select the most relevant documents that will be used to augment the query. This comparison can be done using a variety of methods, which depend in part on the type of indexing used. The model feeds this relevant retrieved information into the LLM via prompt engineering of the user's original query. Newer implementations (as of 2023) can also incorporate specific augmentation modules with abilities such as expanding queries into multiple domains and using memory and self-improvement to learn from previous retrievals. Finally, the LLM can generate output based on both the query and the retrieved documents. Some models incorporate extra steps to improve output, such as the re-ranking of retrieved information, context selection, and fine-tuning. == Applications == Retrieval-augmented generation is used in applications where generated responses need to be grounded in external or frequently updated information. Commonly cited use cases include search engines, question-answering systems, customer support chatbots, enterprise knowledge assistants, content generation, recommendation systems, retail and e-commerce, and industrial or manufacturing workflows. In healthcare, RAG has been studied as a way to ground large language model outputs in external medical knowledge sources, although reviews have noted continuing challenges around evaluation, ethics, and clinical reliability. == Improvements == Improvements to the basic process above can be applied at different stages in the RAG flow. === Encoder === These methods focus on the encoding of text as either dense or sparse vectors. Sparse vectors, which encode the identity of a word, are typically dictionary-length and contain mostly zeros. Dense vectors, which encode meaning, are more compact and contain fewer zeros. Various enhancements can improve the way similarities are calculated in the vector stores (databases). Performance improves by optimizing how vector similarities are calculated. Dot products enhance similarity scoring, while approximate nearest neighbor (ANN) searches improve retrieval efficiency over K-nearest neighbors (KNN) searches. Accuracy may be improved with Late Interactions, which allow the system to compare words more precisely after retrieval. This helps refine document ranking and improve search relevance. Hybrid vector approaches may be used to combine dense vector representations with sparse one-hot vectors, taking advantage of the computational efficiency of sparse dot products over dense vector operations. Other retrieval techniques focus on improving accuracy by refining how documents are selected. Some retrieval methods combine sparse representations, such as SPLADE, with query expansion strategies to improve search accuracy and recall. === Retriever-centric methods === These methods aim to enhance the quality of document retrieval in vector databases: Pre-training the retriever using the Inverse Cloze Task (ICT), a technique that helps the model learn retrieval patterns by predicting masked text within documents. Supervised retriever optimization aligns retrieval probabilities with the generator model's likelihood distribution. This involves retrieving the top-k vectors for a given prompt, scoring the generated response's perplexity, and minimizing KL divergence between the retriever's selections and the model's likelihoods to refine retrieval. Reranking techniques can refine retriever performance by prioritizing the most relevant retrieved documents during training. === Language model === By redesigning the language model with the retriever in mind, a 25-time smaller network can get comparable perplexity as its much larger counterparts. Because it is trained from scratch, this method (Retro) incurs the high cost of training runs that the original RAG scheme avoided. The hypothesis is that by giving domain knowledge during training, Retro needs less focus on the domain and can devote its smaller weight resources only to language semantics. The redesigned language model is shown here. It has been reported that Retro is not reproducible, so modifications were made to make it so. The more reproducible version is called Retro++ and includes in-context RAG. === Chunking === Chunking involves various strategies for breaking up the data into vectors so the retriever can find details in it. Three types of chunking strategies are: Fixed length with overlap. This is fast and easy. Overlapping consecutive chunks helps to maintain semantic context across chunks. Syntax-based chunks can break the document up into sentences. Libraries such as spaCy or NLTK can also help. File format-based chunking. Certain file types have natural chunks built in, and it's best to respect them. For example, code files are best chunked and vectorized as whole functions or classes. HTML files should leave

    or base64 encoded elements

    Read more →
  • Stochastic parrot

    Stochastic parrot

    In machine learning, the term stochastic parrot is a metaphor that frames large language models as systems that statistically mimic text without real understanding. The word "stochastic" – from the ancient Greek "στοχαστικός" (stokhastikos, 'based on guesswork') – is a term from probability theory meaning "randomly determined". The word "parrot" refers to parrots' ability to mimic human speech. The term was introduced in a 2021 paper on AI ethics titled "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" and authored by Timnit Gebru, Emily M. Bender, Angelina McMillan-Major, and Margaret Mitchell. The paper outlined possible risks associated with large language models (LLMs). In December 2020, it was the subject of a workplace dispute between Gebru (then co-leader of Google's Ethical Artificial Intelligence Team) and Google, which had requested the retraction of the paper. The incident culminated in Gebru's controversial departure from the company. The paper was later presented at the 2021 ACM Conference, and the term "stochastic parrot" has seen widespread use in academic research concerning generative AI and LLMs. The term has been interpreted negatively as an insult towards AI. == Background == Timnit Gebru is an AI ethics researcher, Emily M. Bender is a linguist specializing in computational linguistics, and Margaret Mitchell is a computer scientist specializing in algorithmic bias. Gebru had joined Google in 2018, where she co-led a team on the ethics of artificial intelligence with Mitchell. In late 2020, the paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜" was co-written by Gebru and five other researchers, four of whom were Google employees. The paper argues that large language models (LLMs) present significant risks such as environmental and financial costs, inscrutability leading to unknown dangerous biases, and potential for deception as LLMs do not understand the concepts underlying what they learn. The paper states that LLMs are "stitching together sequences of linguistic forms ... observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning." Therefore, they are labeled "stochastic parrots". === Dismissal of Gebru by Google === After the paper was submitted for consideration to the 2021 ACM Conference, Google requested that Gebru either retract the paper from the conference or remove the names of Google employees from it. Gebru refused to do so without further discussion, and emailed Google Research vice president Megan Kacholia that if the company could not explain the request for retraction and address other concerns regarding similar projects, she would plan to resign after a transition period, stating that they could "work on a last date". The following day, on December 2, 2020, Gebru received an email saying that Google was "accepting her resignation". Her abrupt firing sparked protests by Google employees and negative publicity for the company. == Usage == The phrase has been used by AI skeptics to signify that LLMs lack understanding of the meaning of their outputs. Sam Altman, CEO of OpenAI, used the term shortly after the release of ChatGPT in December 2022, tweeting "i am a stochastic parrot, and so r u". The term was nominated as the 2023 AI-related Word of the Year by the American Dialect Society. == Debate == Some LLMs, such as ChatGPT, have become capable of interacting with users in convincingly human-like conversations. The development of these new systems has deepened the discussion of the extent to which LLMs understand or are simply "parroting". According to machine learning researchers Lindholm, Wahlström, Lindsten, and Schön, the term "stochastic parrot" highlights two vital limitations of LLMs: LLMs are limited by the data they are trained on and are simply stochastically repeating contents of datasets. Because they are just making up outputs based on training data, LLMs do not understand if they are saying something incorrect or inappropriate. Lindholm et al. noted that, with poor quality datasets and other limitations, a learning machine might produce results that are "dangerously wrong". === Subjective experience === In the mind of a human being, words and language correspond to things one has experienced. For LLMs, according to proponents of the theory, words correspond only to other words and patterns of usage fed into their training data. Proponents of the idea of stochastic parrots thus conclude that statements about LLMs are due to "the human tendency to attribute meaning to text", and claim this occurs despite the LLMs not actually understanding language. === Fine-tuning === Kelsey Piper argued that the claim that LLMs are stochastic parrots or mere "next-token predictors" focuses on pre-training, ignoring that modern LLMs are also fine-tuned to follow instructions and to prefer accurate answers. === Hallucinations and mistakes === The tendency of LLMs to pass off false information as fact is held as support. Called hallucinations or confabulations, LLMs will occasionally synthesize information that matches some pattern. LLMs may fail to distinguish fact and fiction, which leads to the claim that they can't connect words to a comprehension of the world, as humans do. Furthermore, LLMs may fail to decipher complex or ambiguous grammar cases that rely on understanding the meaning of language. For example: The wet newspaper that fell down off the table is my favorite newspaper. But now that my favorite newspaper fired the editor I might not like reading it anymore. Can I replace 'my favorite newspaper' by 'the wet newspaper that fell down off the table' in the second sentence? GPT-4, an LLM released in March 2023, responded yes, not understanding that the meaning of "newspaper" is different in these two contexts; it is first an object and second an institution. === Benchmarks and experiments === One argument against the hypothesis that LLMs are stochastic parrot is their results on benchmarks for reasoning, common sense and language understanding. In 2023, some LLMs have shown good results on many language understanding tests, such as the Super General Language Understanding Evaluation (SuperGLUE). GPT-4 scored in the >90th-percentile on the Uniform Bar Examination and achieved 93% accuracy on the MATH benchmark of high-school Olympiad problems, results that exceed rote pattern-matching expectations. Such tests, and the smoothness of many LLM responses, help as many as 51% of AI professionals believe they can truly understand language with enough data, according to a 2022 survey. === Expert rebuttals === Some AI researchers dispute the notion that LLMs merely "parrot" their training data. Geoffrey Hinton, a pioneering figure in neural networks, counters that the metaphor misunderstands the prerequisite for accurate language prediction. He argues that "to predict the next word accurately, you have to understand the sentence", a view he presented on 60 Minutes in 2023. From this perspective, understanding is not an alternative to statistical prediction, but rather an emergent property required to perform it effectively at scale. Hinton also uses logical puzzles to demonstrate that LLMs actually understand language. A 2024 Scientific American investigation described a closed Berkeley workshop where state-of-the-art models solved novel tier-4 mathematics problems and produced coherent proofs, indicating reasoning abilities beyond memorization. The GPT-4 Technical Report showed human-level results on professional and academic exams (e.g., the Uniform Bar Exam and USMLE), challenging the "parrot" characterization. Anthropic conducted mechanistic interpretability research on Claude, using attribution graphs to identify circuits. The research showed how the LLM processes information via chains of fuzzy logical inference, and indicated an ability to plan ahead. They found that Claude 3.5 Haiku "employs remarkably general abstractions", forms "internally generated plans for its future outputs" and "works backwards from its longer-term goals". They noted that "The mechanisms of the model can apparently only be faithfully described using an overwhelmingly large causal graph." They also found that the model includes "mechanisms that could underlie a simple form of metacognition", in that it "thinks about" the level of its own knowledge before reaching its answer. === Interpretability === Another line of evidence against the 'stochastic parrot' claim comes from mechanistic interpretability, a research field dedicated to reverse-engineering LLMs to understand their internal workings. Rather than only observing the model's input-output behavior, these techniques probe the model's internal activations, which can be used to determine if they contain structured representations of the world. The goal is to investigate whether LLMs are merely manipulating surface statistics or if t

    Read more →
  • Organoid intelligence

    Organoid intelligence

    Organoid intelligence (OI) is an emerging field of study in computer science and biology that develops and studies biological wetware computing using 3D cultures of human brain cells (or brain organoids) and brain-machine interface technologies. Such technologies may be referred to as OIs or the nervous filesystem. Organoid intelligent computer systems can be an example of biohybrid systems. == Differences with non-organic computing == As opposed to traditional non-organic silicon-based approaches, OI seeks to use lab-grown cerebral organoids to serve as "biological hardware". While these structures are still far from being able to think like a regular human brain and do not yet possess strong computing capabilities, OI research currently offers the potential to improve the understanding of brain development, learning and memory, potentially finding treatments for neurological disorders such as dementia. Thomas Hartung, a professor from Johns Hopkins University, argued in 2023 that "while silicon-based computers are certainly better with numbers, brains are better at learning." He noted that transistor density in computer chip may be approaching its limits, whereas brains, being wired differently, are more energy-efficient and can store large amounts of information. Some researchers claim that even though human brains are slower than machines at processing simple information, they are far better at processing complex information as brains can deal with fewer and more uncertain data, perform both sequential and parallel processing, being highly heterogenous, use incomplete datasets, and is said to outperform non-organic machines in decision-making. Training OIs involve the process of biological learning (BL) as opposed to machine learning (ML) for AIs. == Bioinformatics in OI == OI generates complex biological data, necessitating sophisticated methods for processing and analysis. Bioinformatics provides the tools and techniques to decipher raw data, uncovering the patterns and insights. Researchers have developed a platform named Neuroplatform for experimenting remotely with brain organoids via an API. == Intended functions == Brain-inspired computing hardware aims to emulate the structure and working principles of the brain and could be used to address current limitations in AI technologies. However, brain-inspired silicon chips are still limited in their ability to fully mimic brain function, as most examples are built on digital electronic principles. One study performed OI computation (which they termed Brainoware) by sending and receiving information from the brain organoid using a high-density multielectrode array. By applying spatiotemporal electrical stimulation, nonlinear dynamics, and fading memory properties, as well as unsupervised learning from training data by reshaping the organoid functional connectivity, the study showed the potential of this technology by using it for speech recognition and nonlinear equation prediction in a reservoir computing framework. == Ethical concerns == While researchers are hoping to use OI and biological computing to complement traditional silicon-based computing, there are also questions about the ethics of such an approach. Concerns include the possibility that an organoid could develop sentience or consciousness, and the question of the relationship between a stem cell donor (for growing the organoid) and the respective OI system.

    Read more →
  • Document mosaicing

    Document mosaicing

    Document mosaicing is a process that stitches multiple, overlapping snapshot images of a document together to produce one large, high resolution composite. The document is slid under a stationary, over-the-desk camera by hand until all parts of the document are snapshotted by the camera's field of view. As the document slid under the camera, all motion of the document is coarsely tracked by the vision system. The document is periodically snapshotted such that the successive snapshots are overlap by about 50%. The system then finds the overlapped pairs and stitches them together repeatedly until all pairs are stitched together as one piece of document. The document mosaicing can be divided into four main processes. Tracking Feature detecting Correspondences establishing Images mosaicing. == Tracking (simple correlation process) == In this process, the motion of the document slid under the camera is coarsely tracked by the system. Tracking is performed by a process called simple correlation process. In the first frame of snapshots, a small patch is extracted from the center of the image as a correlation template. The correlation process is performed in the four times size of the patch area of the next frame. The motion of the paper is indicated by the peak in the correlation function. The peak in the correlation function indicates the motion of the paper. The template is resampled from this frame and the tracking continues until the template reaches the edge of the document. After the template reaches the edge of the document, another snapshot is taken and the tracking process performs repeatedly until the whole document is imaged. The snapshots are stored in an ordered list to facilitate pairing the overlapped images in later processes. == Feature detecting for efficient matching == Feature detection is the process of finding the transformation that aligns one image with another. There are two main approaches for feature detection. Feature-based approach : Motion parameters are estimated from point correspondences. This approach is suitable for the case that there is plenty supply of stable and detectable features. Featureless approach : When the motion between the two images is small, the motion parameters are estimated using optical flow. On the other hand, when the motion between the two images is large, the motion parameters are estimated using generalised cross-correlation. However, this approach requires a computationally expensive resources. Each image is segmented into a hierarchy of columns, lines, and words to match the organised sets of features across images. Skew angle estimation and columns, lines and words finding are the examples of feature detection operations. === Skew angle estimation === Firstly, the angle that the rows of text make with the image raster lines (skew angle) is estimated. It is assumed to lie in the range of ±20°. A small patch of text in the image is selected randomly and then rotated in the range of ±20° until the variance of the pixel intensities of the patch summed along the raster lines is maximised. To ensure that the found skew angle is accurate, the document mosaic system performs calculation at many image patches and derive the final estimation by finding the average of the individual angles weighted by the variance of the pixel intensities of each patch. === Columns, lines and words finding === In this operation, the de-skewed document is intuitively segmented into a hierarchy of columns, lines and words. The sensitivity to illumination and page coloration of the de-skewed document can be removed by applying a Sobel operator to the de-skewed image and thresholding the output to obtain the binary gradient, de-skewed image. The operation can be roughly separated into 3 steps: column segmentation, line segmentation and word segmentation. Columns are easily segmented from the binary gradient, de-skewed images by summing pixels vertically. Baselines of each row are segmented in the same way as the column segmentation process but horizontally. Finally, individual words are segmented by applying the vertical process at each segmented row. These segmentations are important because the document mosaic is created by matching the lower right corners of words in overlapping images pair. Moreover, the segmentation operation can organize the list of images in the context of a hierarchy of rows and column reliably. The segmentation operation involves a considerable amount of summing in the binary gradient, de-skewed images, which done by construct a matrix of partial sums whose elements are given by p i y = ∑ u = 1 i ∑ v = 1 j b u v {\displaystyle p_{iy}=\sum _{u=1}^{i}\sum _{v=1}^{j}b_{uv}} The matrix of partial sums is calculated in one pass through the binary gradient, de-skewed image. ∑ u = u 1 u 2 ∑ v = v 1 v 2 b u v = p u 2 v 2 + p u 1 v 1 − p u 1 v 2 − p u 2 v 1 {\displaystyle \sum _{u=u_{1}}^{u_{2}}\sum _{v=v_{1}}^{v_{2}}b_{uv}=p_{u_{2}v_{2}}+p_{u_{1}v_{1}}-p_{u_{1}v_{2}}-p_{u_{2}v_{1}}} == Correspondences establishing == The two images are now organized in hierarchy of linked lists in following structure : image=list of columns row=list of words column=list of row word=length (in pixels) At the bottom of the structure, the length of each word is recorded for establishing correspondence between two images to reduce to search only the corresponding structures for the groups of words with the matching lengths. === Seed match finding === A seed match finding is done by comparing each row in image1 with each row in image2. The two rows are then compared to each other by every word. If the length (in pixel) of the two words (one from image1 and one from image2) and their immediate neighbours agree with each other within a predefined tolerance threshold (5 pixels, for example), then they are assumed to match. The row of each image is assumed a match if there are three or more word matches between the two rows. The seed match finding operation is terminated when two pairs of consecutive row match are found. === Match list building === After finishing a seed match finding operation, the next process is to build the match list to generate the correspondences points of the two images. The process is done by searching the matching pairs of rows away from the seed row. == Images mosaicing == Given the list of corresponding points of the two images, finding the transformation of the overlapping portion of the images is the next process. Assuming a pinhole camera model, the transformation between pixels (u,v) of image 1 and pixels (u0, v0) of image 2 is demonstrated by a plane-to-plane projectivity. [ s u ′ s v ′ s ] = [ p 11 p 12 p 13 p 21 p 22 p 23 p 31 p 32 1 ] [ u v 1 ] E q .1 {\displaystyle \left[{\begin{array}{c}su'\\sv'\\s\end{array}}\right]=\left[{\begin{array}{ccc}p_{11}&p_{12}&p_{13}\\p_{21}&p_{22}&p_{23}\\p_{31}&p_{32}&1\end{array}}\right]\left[{\begin{array}{c}u\\v\\1\end{array}}\right]\qquad Eq.1} The parameters of the projectivity is found from four pairs of matching points. RANSAC regression technique is used to reject outlying matches and estimate the projectivity from the remaining good matches. The projectivity is fine-tuned using correlation at the corners of the overlapping portion to obtain four correspondences to sub-pixel accuracy. Therefore, image1 is then transformed into image2's coordinate system using Eq.1. The typical result of the process is shown in Figure 5. === Many images coping === Finally, the whole page composition is built up by mapping all the images into the coordinate system of an "anchor" image, which is normally the one nearest the page center. The transformations to the anchor frame are calculated by concatenating the pair-wise transformations found earlier. The raw document mosaic is shown in Figure 6. However, there might be a problem of non-consecutive images that are overlap. This problem can be solved by performing Hierarchical sub-mosaics. As shown in Figure 7, image1 and image2 are registered, as are image3 and image4, creating two sub-mosaics. These two sub-mosaics are later stitched together in another mosaicing process. == Applied areas == There are various areas that the technique of document mosaicing can be applied to such as : Text segmentation of images of documents Document Recognition Interaction with paper on the digital desk Video mosaics for virtual environments Image registration techniques == Relevant research papers == Huang, T.S.; Netravali, A.N. (1994). "Motion and structure from feature correspondences: A review". Proceedings of the IEEE. 82 (2): 252–268. doi:10.1109/5.265351. D.G. Lowe. [1] Perceptual Organization and Visual Recognition. Kluwer Academic Publishers, Boston, 1985. Irani, M.; Peleg, S. (1991). "Improving resolution by image registration". CVGIP: Graphical Models and Image Processing. 53 (3): 231–239. doi:10.1016/1049-9652(91)90045-L. S2CID 4834546. Shivakumara, P.; Kumar, G. Hemantha; Guru, D. S.; Nagabhushan, P. (2006). "

    Read more →
  • AUTINDEX

    AUTINDEX

    AUTINDEX is a commercial text mining software package based on sophisticated linguistics. AUTINDEX, resulting from research in information extraction, is a product of the Institute of Applied Information Sciences (IAI) which is a non-profit institute that has been researching and developing language technology since its foundation in 1985. IAI is an institute affiliated to Saarland University in Saarbrücken, Germany. AUTINDEX is the result of a number of research projects funded by the EU (Project BINDEX), by Deutsche Forschungsgemeinschaft and the German Ministry for Economy. Amongst the latter there are the projects LinSearch, and WISSMER, see also the reference to IAI-Website. The basic functionality of AUTINDEX is the extraction of key words from a document to represent the semantics of the document. Ideally the system is integrated with a thesaurus that defines the standardised terms to be used for key word assignment. AUTINDEX is used in library applications (e.g. integrated in dandelon.com) as well as in high quality (expert) information systems, and in document management and content management environments. Together with AUTINDEX a number of additional software comes along such as an integration with Apache Solr / Lucene to provide a complete information retrieval environment, a classification and categorisation system on the basis of a machine learning software that assigns domains to the document, and a system for searching with semantically similar terms that are collected in so called tag clouds.

    Read more →
  • SimSimi

    SimSimi

    SimSimi is an artificial intelligence conversation program created in 2002 by ISMaker. It grows its artificial intelligence day by day assisted by a feature that allows users to teach it to respond correctly. SimSimi, pronounced as "shim-shimi", is from a Korean word simsim (심심) which means "bored". It has an application designed for Android, Windows Phone and iOS. The application was banned in Thailand in 2012 after users taught it to make responses containing profanity, and to criticise leading politicians. In April 2018, SimSimi was suspended in Brazil due to accusations of sending inappropriate messages, such as sexual language, bullying and even death threats, being labeled as "dangerous" mainly due to its popularity among children, and according to its developer, the suspension of the app in the country "was inevitable because the SimSimi app, at least in the last few days, had a significant negative social impact in Brazil.”

    Read more →
  • Snapshot isolation

    Snapshot isolation

    In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot. Snapshot isolation has been adopted by several major database management systems, such as InterBase, Firebird, Oracle, MySQL, PostgreSQL, SQL Anywhere, MongoDB and Microsoft SQL Server (2005 and later). The main reason for its adoption is that it allows better performance than serializability, yet still avoids most of the concurrency anomalies that serializability avoids (but not all). In practice snapshot isolation is implemented within multiversion concurrency control (MVCC), where generational values of each data item (versions) are maintained: MVCC is a common way to increase concurrency and performance by generating a new version of a database object each time the object is written, and allowing transactions' read operations of several last relevant versions (of each object). Snapshot isolation has been used to criticize the ANSI SQL-92 standard's definition of isolation levels, as it exhibits none of the "anomalies" that the SQL standard prohibited, yet is not serializable (the anomaly-free isolation level defined by ANSI). In spite of its distinction from serializability, snapshot isolation is sometimes referred to as serializable by Oracle. == Definition == A transaction executing under snapshot isolation appears to operate on a personal snapshot of the database, taken at the start of the transaction. When the transaction concludes, it will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was taken. Such a write–write conflict will cause the transaction to abort. In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur "first", and be visible to the other. In contrast, snapshot isolation permits write skew anomalies. As a concrete example, imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 ≥ 0). Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200 from V1, and T2 withdrawing $200 from V2. If the database guaranteed serializable transactions, the simplest way of coding T1 is to deduct $200 from V1, and then verify that V1 + V2 ≥ 0 still holds, aborting if not. T2 similarly deducts $200 from V2 and then verifies V1 + V2 ≥ 0. Since the transactions must serialize, either T1 happens first, leaving V1 = −$100, V2 = $100, and preventing T2 from succeeding (since V1 + (V2 − $200) is now −$200), or T2 happens first and similarly prevents T1 from committing. If the database is under snapshot isolation(MVCC), however, T1 and T2 operate on private snapshots of the database: each deducts $200 from an account, and then verifies that the new total is zero, using the other account value that held when the snapshot was taken. Since neither update conflicts, both commit successfully, leaving V1 = V2 = −$100, and V1 + V2 = −$200. Some systems built using multiversion concurrency control (MVCC) may support (only) snapshot isolation to allow transactions to proceed without worrying about concurrent operations, and more importantly without needing to re-verify all read operations when the transaction finally commits. This is convenient because MVCC maintains a series of recent history consistent states. The only information that must be stored during the transaction is a list of updates made, which can be scanned for conflicts fairly easily before being committed. However, MVCC systems (such as MarkLogic) will use locks to serialize writes together with MVCC to obtain some of the performance gains and still support the stronger "serializability" level of isolation. == Workarounds == Potential inconsistency problems arising from write skew anomalies can be fixed by adding (otherwise unnecessary) updates to the transactions in order to enforce the serializability property. Materialize the conflict Add a special conflict table, which both transactions update in order to create a direct write–write conflict. Promotion Have one transaction "update" a read-only location (replacing a value with the same value) in order to create a direct write–write conflict (or use an equivalent promotion, e.g. Oracle's SELECT FOR UPDATE). In the example above, we can materialize the conflict by adding a new table which makes the hidden constraint explicit, mapping each person to their total balance. Phil would start off with a total balance of $200, and each transaction would attempt to subtract $200 from this, creating a write–write conflict that would prevent the two from succeeding concurrently. However, this approach violates the normal form. Alternatively, we can promote one of the transaction's reads to a write. For instance, T2 could set V1 = V1, creating an artificial write–write conflict with T1 and, again, preventing the two from succeeding concurrently. This solution may not always be possible. In general, therefore, snapshot isolation puts some of the problem of maintaining non-trivial constraints onto the user, who may not appreciate either the potential pitfalls or the possible solutions. The upside to this transfer is better performance. == Terminology == Snapshot isolation is called "serializable" mode in Oracle and PostgreSQL versions prior to 9.1, which may cause confusion with the "real serializability" mode. There are arguments both for and against this decision; what is clear is that users must be aware of the distinction to avoid possible undesired anomalous behavior in their database system logic. == History == Snapshot isolation arose from work on multiversion concurrency control databases, where multiple versions of the database are maintained concurrently to allow readers to execute without colliding with writers. Such a system allows a natural definition and implementation of such an isolation level. InterBase, later owned by Borland, was acknowledged to provide SI rather than full serializability in version 4, and likely permitted write-skew anomalies since its first release in 1985. Unfortunately, the ANSI SQL-92 standard was written with a lock-based database in mind, and hence is rather vague when applied to MVCC systems. Berenson et al. wrote a paper in 1995 critiquing the SQL standard, and cited snapshot isolation as an example of an isolation level that did not exhibit the standard anomalies described in the ANSI SQL-92 standard, yet still had anomalous behaviour when compared with serializable transactions. In 2008, Cahill et al. showed that write-skew anomalies could be prevented by detecting and aborting "dangerous" triplets of concurrent transactions. This implementation of serializability is well-suited to multiversion concurrency control databases, and has been adopted in PostgreSQL 9.1, where it is known as Serializable Snapshot Isolation (SSI). When used consistently, this eliminates the need for the above workarounds. The downside over snapshot isolation is an increase in aborted transactions. This can perform better or worse than snapshot isolation with the above workarounds, depending on workload.

    Read more →
  • Sketch Engine

    Sketch Engine

    Sketch Engine is a corpus manager and text analysis software developed by Lexical Computing since 2003. Its purpose is to enable people studying language behaviour (lexicographers, researchers in corpus linguistics, translators or language learners) to search large text collections according to complex and linguistically motivated queries. Sketch Engine gained its name after one of the key features, word sketches: one-page, automatic, corpus-derived summaries of a word's grammatical and collocational behaviour. Currently, it supports and provides corpora in over 100 languages. == History of development == Sketch Engine is a product of Lexical Computing, a company founded in 2003 by the lexicographer and research scientist Adam Kilgarriff. He started a collaboration with Pavel Rychlý, a computer scientist working at the Natural Language Processing Centre, Masaryk University, and the developer of Manatee and Bonito (two major parts of the software suite). Kilgarriff also introduced the concept of word sketches. Since then, Sketch Engine has been commercial software, however, all the core features of Manatee and Bonito that were developed by 2003 (and extended since then) are freely available under the GPL license within the NoSketch Engine suite. == Features == A list of tools available in Sketch Engine: Word sketches – a one-page automatic derived summary of a word's grammatical and collocational behaviour Word sketch difference – compares and contrasts two words by analysing their collocations Distributional thesaurus – automated thesaurus for finding words with similar meaning or appearing in the same/similar context Concordance search – finds occurrences of a word form, lemma, phrase, tag or complex structure Collocation search – word co-occurrence analysis displaying the most frequent words (for a search word) which can be regarded as collocation candidates Word lists – generates frequency lists which can be filtered with complex criteria n-grams – generates frequency lists of multi-word expressions Terminology / Keyword extraction (both monolingual and bilingual) – automatic extraction of key words and multi-word terms from texts (based on frequency count and linguistic criteria) Diachronic analysis (Trends) – detecting words which undergo changes in the frequency of use in time (show trending words) Corpus building and management – create corpora from the Web or uploaded texts including part-of-speech tagging and lemmatization which can be used as data mining software Parallel corpus (bilingual) facilities – looking up translation examples (EUR-Lex corpus, Europarl corpus, OPUS corpus, etc.) or building a parallel corpus from own aligned texts Text type analysis – statistics of metadata in the corpus === Keywords and terminology extraction === Sketch Engine can perform automatic term extraction by identifying words typical of a particular corpus, document, or text. Single words and multi-word units can be extracted from monolingual or bilingual texts. The terminology extraction feature provides a list of relevant terms based on comparison with a large corpus of general language. This functionality is also available as a separate service called OneClick Terms with a dedicated interface. === SKELL === A free web service based on Sketch Engine and aimed at language learners and teachers is SKELL (formerly SkELL). It exploits Sketch Engine's proprietary GDEX (Good Dictionary Examples) scoring function to provide authentic example sentences for specific target words. Results are drawn from a special corpus of high-quality texts covering everyday, standard, formal, and professional language and displayed as a concordance. SKELL also includes simplified versions of Sketch Engine's word sketch and thesaurus functions. It has been suggested that SKELL can be used, for instance, to help students understand the meaning and/or usage of a word or phrase; to help teachers wanting to use example sentences in a class; to discover and explore collocates; to create gap-fill exercises; to teach various kinds of homonyms and polysemous words. SKELL was first presented in 2014, when only English was supported. Later, support was added for Russian, Czech, German, Italian and Estonian. == List of text corpora == Sketch Engine provides access to more than 800 text corpora. There are monolingual as well as multilingual corpora of different sizes (from one thousand words up to 85 billion words) and various sources (e.g. web, books, subtitles, legal documents). The list of corpora includes British National Corpus, Brown Corpus, Cambridge Academic English Corpus and Cambridge Learner Corpus, CHILDES corpora of child language, OpenSubtitles (a set of 60 parallel corpora), 24 multilingual corpora of EUR-Lex documents, the TenTen Corpus Family (multi-billion web corpora), and Trends corpora (monitor corpora with daily updates). == Architecture == Sketch Engine consists of three main components: an underlying database management system called Manatee, a web interface search front-end called Bonito, and a web interface for corpus building and management called Corpus Architect. === Manatee === Manatee is a database management system specifically devised for effective indexing of large text corpora. It is based on the idea of inverted indexing (keeping an index of all positions of a given word in the text). It has been used to index text corpora comprising tens of billions of words. Searching corpora indexed by Manatee is performed by formulating queries in the Corpus Query Language (CQL). Manatee is written in C++ and offers an API for a number of other programming languages including Python, Java, Perl and Ruby. Recently, it was rewritten into Go for faster processing of corpus queries. === Bonito === Bonito is a web interface for Manatee providing access to corpus search. In the client–server model, Manatee is the server and Bonito plays the client part. It is written in Python. === Corpus Architect === Corpus Architect is a web interface providing corpus building and management features. It is also written in Python. == Applications == Sketch Engine has been used by major British and other publishing houses for producing dictionaries such as Macmillan English Dictionary, Dictionnaires Le Robert, Oxford University Press or Shogakukan. Four of United Kingdom's five biggest dictionary publishers use Sketch Engine.

    Read more →
  • PhyCV

    PhyCV

    PhyCV is the first computer vision library which utilizes algorithms directly derived from the equations of physics governing physical phenomena. The algorithms appearing in the first release emulate the propagation of light through a physical medium with natural and engineered diffractive properties followed by coherent detection. Unlike traditional algorithms that are a sequence of hand-crafted empirical rules, physics-inspired algorithms leverage physical laws of nature as blueprints. In addition, these algorithms can, in principle, be implemented in real physical devices for fast and efficient computation in the form of analog computing. Currently PhyCV has three algorithms, Phase-Stretch Transform (PST) and Phase-Stretch Adaptive Gradient-Field Extractor (PAGE), and Vision Enhancement via Virtual diffraction and coherent Detection (VEViD). All algorithms have CPU and GPU versions. PhyCV is now available on GitHub and can be installed from pip. == History == Algorithms in PhyCV are inspired by the physics of the photonic time stretch (a hardware technique for ultrafast and single-shot data acquisition). PST is an edge detection algorithm that was open-sourced in 2016 and has 800+ stars and 200+ forks on GitHub. PAGE is a directional edge detection algorithm that was open-sourced in February, 2022. PhyCV was originally developed and open-sourced by Jalali-Lab @ UCLA in May 2022. In the initial release of PhyCV, the original open-sourced code of PST and PAGE is significantly refactored and improved to be modular, more efficient, GPU-accelerated and object-oriented. VEViD is a low-light and color enhancement algorithm that was added to PhyCV in November 2022. == Background == === Phase-Stretch Transform (PST) === Phase-Stretch Transform (PST) is a computationally efficient edge and texture detection algorithm with exceptional performance in visually impaired images. The algorithm transforms the image by emulating propagation of light through a device with engineered diffractive property followed by coherent detection. It has been applied in improving the resolution of MRI image, extracting blood vessels in retina images, dolphin identification, and waste water treatment, single molecule biological imaging, and classification of UAV using micro Doppler imaging. === Phase-Stretch Adaptive Gradient-Field Extractor (PAGE) === Phase-Stretch Adaptive Gradient-Field Extractor (PAGE) is a physics-inspired algorithm for detecting edges and their orientations in digital images at various scales. The algorithm is based on the diffraction equations of optics. Metaphorically speaking, PAGE emulates the physics of birefringent (orientation-dependent) diffractive propagation through a physical device with a specific diffractive structure. The propagation converts a real-valued image into a complex function. Related information is contained in the real and imaginary components of the output. The output represents the phase of the complex function. === Vision Enhancement via Virtual diffraction and coherent Detection (VEViD) === Vision Enhancement via Virtual diffraction and coherent Detection (VEViD) an efficient and interpretable low-light and color enhancement algorithm that reimagines a digital image as a spatially varying metaphoric light field and then subjects the field to the physical processes akin to diffraction and coherent detection. The term “Virtual” captures the deviation from the physical world. The light field is pixelated and the propagation imparts a phase with an arbitrary dependence on frequency which can be different from the quadratic behavior of physical diffraction. VEViD can be further accelerated through mathematical approximations that reduce the computation time without appreciable sacrifice in image quality. A closed-form approximation for VEViD which we call VEViD-lite can achieve up to 200 FPS for 4K video enhancement. == PhyCV on the Edge == Featuring low-dimensionality and high-efficiency, PhyCV is ideal for edge computing applications. In this section, we demonstrate running PhyCV on NVIDIA Jetson Nano in real-time. === NVIDIA Jetson Nano Developer Kit === NVIDIA Jetson Nano Developer Kit is a small- sized and power-efficient platform for edge computing applications. It is equipped with an NVIDIA Maxwell architecture GPU with 128 CUDA cores, a quad-core ARM Cortex-A57 CPU, 4GB 64-bit LPDDR4 RAM, and supports video encoding and decoding up to 4K resolution. Jetson Nano also offers a variety of interfaces for connectivity and expansion, making it ideal for a wide range of AI and IoT applications. In our setup, we connect a USB camera to the Jetson Nano to acquire videos and demonstrate using PhyCV to process the videos in real-time. === Real-time PhyCV on Jetson Nano === We use the Jetson Nano (4GB) with NVIDIA JetPack SDK version 4.6.1, which comes with pre- installed Python 3.6, CUDA 10.2, and OpenCV 4.1.1. We further install PyTorch 1.10 to enable the GPU accelerated PhyCV. We demonstrate the results and metrics of running PhyCV on Jetson Nano in real-time for edge detection and low-light enhancement tasks. For 480p videos, both operations achieve beyond 38 FPS, which is sufficient for most cameras that capture videos at 30 FPS. For 720p videos, PhyCV low-light enhancement can operate at 24 FPS and PhyCV edge detection can operate at 17 FPS. == Highlights == === Modular Code Architecture === The code in PhyCV has a modular design which faithfully follows the physical process from which the algorithm was originated. Both PST and PAGE modules in the PhyCV library emulate the propagation of the input signal (original digital image) through a device with engineered diffractive property followed by coherent (phase) detection. The dispersive propagation applies a phase kernel to the frequency domain of the original image. This process has three steps in general, loading the image, initializing the kernel and applying the kernel. In the implementation of PhyCV, each algorithm is represented as a class in Python and each class has methods that simulate the steps described above. The modular code architecture follows the physics behind the algorithm. Please refer to the source code on GitHub for more details. === GPU Acceleration === PhyCV supports GPU acceleration. The GPU versions of PST and PAGE are built on PyTorch accelerated by the CUDA toolkit. The acceleration is beneficial for applying the algorithms in real-time image video processing and other deep learning tasks. The running time per frame of PhyCV algorithms on CPU (Intel i9-9900K) and GPU (NVIDIA TITAN RTX) for videos at different resolutions are shown below. Note that the PhyCV low-light enhancement operates in the HSV color space, so the running time also includes RGB to HSV conversion. However, for all running times using GPUs, we ignore the time of moving data from CPUs to GPUs and count the algorithm operation time only. == Installation and Examples == Please refer to the GitHub README file for a detailed technical documentation. == Current Limitations == === I/O (Input/Output) Bottleneck for Real-time Video Processing === When dealing with real-time video streams from cameras, the frames are captured and buffered in CPU and have to be moved to GPU to run the GPU-accelerated PhyCV algorithms. This process is time-consuming and it is a common bottleneck for real-time video-processing algorithms. === Lack of Parameter Adaptivity for Different Images === Currently, the parameters of PhyCV algorithms have to be manually tuned for different images. Although a set of pre-selected parameters work relatively well for a wide range of images, the lack of parameter adaptivity for different images remains a limitation for now.

    Read more →