AI Essay Planner

AI Essay Planner — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Attensity

    Attensity

    Attensity was an American company that provided social analytics and engagement applications for social customer relationship management (social CRM). Attensity's text analytics software applications extracted facts, relationships and sentiment from unstructured data. == History == Attensity was founded in 2000. An early investor in Attensity was In-Q-Tel, which funds technology to support the missions of the US Government and the broader DOD. InTTENSITY, an independent company that has combined Inxight with Attensity Software (the only joint development project that combines two InQTel funded software packages), was the exclusive distributor and outlet for Attensity in the Federal Market. In 2009, Attensity Corp., then based in Palo Alto, merged with Germany's Empolis and Living-e AG to form Attensity Group. In 2010, Attensity Group acquired Biz360, a provider of social media monitoring and market intelligence solutions. In early 2012, Attensity Group divested itself of the Empolis business unit via a management buyout; that unit currently conducts business under its pre-merger name. Attensity Group was a closely held private company. Its majority shareholder was Aeris Capital, a private Swiss investment office advising a high-net-worth individual and his charitable foundation. Foundation Capital, Granite Ventures, and Scale Venture Partners were among Biz360's investors and thus became shareholders in Attensity Group. In February 2016, Attensity's IP assets were acquired by InContact, and Attensity closed.

    Read more →
  • Uncertain database

    Uncertain database

    An uncertain database is a kind of database studied in database theory. The goal of uncertain databases is to manage information on which there is some uncertainty. Uncertain databases make it possible to explicitly represent and manage uncertainty on the data, usually in a succinct way. == Formal definition == At the basis of uncertain databases is the notion of possible world. Specifically, a possible world of an uncertain database is a (certain) database which is one of the possible realizations of the uncertain database. A given uncertain database typically has more than one, and potentially infinitely many, possible worlds. A formalism to represent uncertain databases then explains how to succinctly represent a set of possible worlds into one uncertain database. == Types of uncertain databases == Uncertain database models differ in how they represent and quantify these possible worlds: Incomplete databases are a compact representation of the set of possible worlds – the use of NULL in SQL, arguably the most commonplace instantiation of uncertain databases, is an example of incomplete database model. Probabilistic databases are a compact representation of a probability distribution over the set of possible worlds. Fuzzy databases are a compact representation of a fuzzy set of the possible worlds. Though mostly studied in the relational setting, uncertain database models can also be defined in other relational models such as graph databases or XML databases. === Incomplete database === The most common database model is the relational model. Multiple incomplete database models have been defined over the relational model, that form extensions to the relational algebra. These have been called Imieliński–Lipski algebras: Relations with NULL values, also called Codd tables c-tables v-tables === Example === The following table is a relation of an incomplete database, described in the formalism of NULL values: There are infinitely many possible worlds for this incomplete database, obtained by replacing the "NULL" values with concrete values. For instance, the following relation is a possible world:

    Read more →
  • AI: When a Robot Writes a Play

    AI: When a Robot Writes a Play

    AI: When a Robot Writes a Play (in Czech: AI: Když robot píše hru) is a 2021 experimental theatre play, where 90% of its script was automatically generated by artificial intelligence (the GPT-2 language model). The play is in Czech language, but an English version of the script also exists. == Creation == The play is the first result of the THEaiTRE research project, aiming to commemorate the centenary of the R.U.R. play by Karel Čapek by investigating to what extent artificial intelligence could be used to create theatre play scripts. The script of the play was created using the THEaiTRobot tool, based on the GPT-2 language model. First, the play dramaturge, David Košťák, described the initial setting of each scene in a few sentences, and wrote the first line for each character. Next, THEaiTRobot suggested a continuation of the script, which the dramaturge could use, reject, or use part of it and let the tool generate a new continuation. Another option was to manually insert another line or a scenic remark. The script was generated in English and was automatically translated to Czech by the state-of-the-art CUBBITT machine translation tool. The resulting script was then further post-edited by the dramaturge. The resulting script was made freely available for non-commercial use both in English and in Czech, with marked manually inserted texts and manual edits. The analysis shows that 90% of the English script is automatically generated, with 10% manually written or manually post-edited. In the Czech script, a larger amount of edits were made, but the analysis claims that these additional edits are corrections of errors of the automated translation and stylistic corrections which do not change the meaning of the lines as represented by the English script, but rather bring the Czech script closer to the English one. == Characters == The play contains 9 characters. The Robot appears in all the scenes, while each of the other characters appears in only one scene. Robot – The lead character, a male humanoid robot. Master – An old man, the creator of the Robot. Boy – A schoolboy. Masseuse – A sex worker in a brothel. Stranger – An engineer. Man. Psychologist. Administrator – A female clerk at an employment agency. Actress – A film actress and a model in a robot-like costume. == Plot == The play is composed of 8 scenes. It tells the story of a humanoid robot, who encounters 8 other characters and engages into various typically human situations and activities, related to death, love, sex, violence, etc. The individual scenes are not tightly linked, but there are some linking points, such as the central character of the robot or some repeated and developing themes, such as the robot's search for love. The scenes often contain some absurd turns and it is often hard to find sense in them. It is therefore a very complicated piece interpretationally, requiring the director and the actors to invest a lot of effort and creativity in finding a meaningful interpretation which would not deviate from the script. In the interpretation by Švanda theatre, who premiered the play and who also participated on the creation of the script, the scenes typically contain non-verbally expressed content which can add a lot to the meaning of the scene compared to what is contained in the actual script (as the script only contains the lines said by the characters). === Scene 1: Death === The play opens by the Robot parting with his dying Master. The Master gives the Robot several last lessons and talks with him about death, soul, and love. === Scene 2: Sense of Humour === In the second scene, the Robot meets a sad and angry Boy, who complains that he wants to go to school, that his girlfriend is crazy, that he wants to buy a car, etc. The Robot tries to help the Boy by giving him advice, but the Boy's reactions are quite negative and irritated. The Boy then repeatedly asks the Robot to tell him a joke; the Robot keeps refusing, but ultimately tells the following joke: When you are dead. When your children are dead. When your grandchildren are dead, I will be still alive. === Scene 3: Nightclub === The Robot wants to feel pleasure, so he goes to a "night club" (a brothel), where he meets a "Masseuse" (a prostitute). The Robot is initially "a bit cold", but eventually manages to enjoy the experience and falls in love with the Masseuse. In the Švanda theatre performance, the Robot and the Masseuse seem to have a sort of virtual sex without touching each other, reminiscent of the sex scene in Demolition Man. === Scene 4: Fear of the Dark === It is the night. The Robot is standing under a lamp, unable to move away from the light as he finds that he is afraid of the dark. He meets a Stranger, an engineer who tells him that robots don't have feelings and that people cannot be trusted, and keeps hurting him. In the Švanda theatre performance, the Man repeatedly zaps the Robot with some kind of electric pulse. === Scene 5: Killer Robot === A Man approaches the Robot and repeatedly asks him to kill him. Instead, the Robot sticks a finger into the Man's anus, which leads to an argument between the Man and the Robot. === Scene 6: Burn Out === The Robot meets a Psychologist, who keeps asking him lots of questions regarding his life, burnout feeling, love, relationships, and emotions. They also talk about the Robot using a device called emotion machine which helps him to get rid of stress. === Scene 7: Search for Job === The Robot comes to an employment agency. He meets an Administrator and asks her to help him find a job. He expresses the wish to become an actor, and talks about his experience as a clown. He reveals his name to be Troy McClure, which is a character from The Simpsons who is an actor. In the Švanda theatre performance, the Administrator starts to seduce the Robot once his name is revealed, which he keeps ignoring; the Administrator then becomes irritated. === Scene 8: Love at First Sight === The Robot meets a human Actress in a robotic costume and falls in love with her immediately. The Actress is first reluctant, but the Robot manages to seduce her and she also falls in love with him. The Robot tells her about a binary world, in which he lives and where he will also take her. Ultimately, the Actress agrees, and the whole play concludes by the Robot and the Actress promising each to other to always be together. In the Švanda theatre performance, the Robot does not have a physical body in this scene, we can only hear his voice and see a pulsating light (based on the line in the script where the Robot says: "I have no body. So I don't need to wear clothes. You can't see me, you only hear me."), and the Actress eventually also agrees to lose her physical body so that she can be with the Robot forever. == Theatrical performances == The play premiered on 26 February 2021 in Švanda Theatre in Prague, Czech Republic, directed by Daniel Hrbek. Due to the COVID-19 pandemic, the play was not played in front of a live audience, but it was broadcast online, in Czech language with English subtitles. The play was followed by a panel discussion by the project members and experts on artificial intelligence. The premiere was viewed by 13,498 spectators worldwide. A short trailer of the premiere is available on YouTube. In 2021, after the opening of the theatres in the Czech Republic to spectators, the play can be viewed at Švanda Theatre. The performance takes approximately 60 minutes, and is followed by a discussion of the creators with the audience. The derniere is planned for 4 February 2023. == Reception == The play received a number of reviews, both in its country of origin as well as internationally. It is praised as first of its kind, although some reviewers note the similarity to previous works, such as the musical Beyond the Fence, the play Lifestyle of the Richard and Family, or the short movie Sunspring; however, these works used less advanced technology, and either were very short (Sunspring) or necessitated a larger amount of human interventions. The reviewers note that the script is far from perfect, with many inconsistencies and nonsensical parts, and conclude that the technology is definitely not yet ready to replace human authors; however, some find some parts of the script frighteningly human-like. The amount of human intervention is a somewhat controversial topic, with some reviewers finding the human influence too large (especially in interpreting the script and putting the play on scene), while others feel that a greater amount of human intervention would have been favorable as this could greatly improve the quality of the play. The reviews also frequently comment on the amount of sex, violence and strong language in the play; this can be attributed to the method used for creating the script, where the GPT-2 language model reflects topics and language common in the human-written articles on the internet that were used to train the model. Furthermore, some r

    Read more →
  • Browsing

    Browsing

    Browsing is a kind of orienting strategy. It is supposed to identify something of relevance for the browsing organism. In context of humans, it is a metaphor taken from the animal kingdom. It is used, for example, about people browsing open shelves in libraries, window shopping, or browsing databases or the Internet. In library and information science, it is an important subject, both purely theoretically and as applied science aiming at designing interfaces which support browsing activities for the user. == Definition == In 2011, Birger Hjørland provided the following definition: "Browsing is a quick examination of the relevance of a number of objects which may or may not lead to a closer examination or acquisition/selection of (some of) these objects. It is a kind of orienting strategy that is formed by our "theories", "expectations" and "subjectivity". == Controversies == As with any kind of human psychology, browsing can be understood in biological, behavioral, or cognitive terms on the one hand or in social, historical, and cultural terms on the other hand. In 2007, Marcia Bates researched browsing from "behavioural" approaches, while Hjørland (2011a+b) defended a social view. Bates found that browsing is rooted in our history as exploratory, motile animals hunting for food and nesting opportunities. According to Hjørland (2011a), on the other hand, Marcia Bates' browsing for information about browsing is governed by her behavioral assumptions, while Hjørland's browsing for information about browsing is governed by his socio-cultural understanding of human psychology. In short: Human browsing is based on our conceptions and interests. === Is browsing a random activity? === Browsing is often understood as a random activity. Dictionary.com, for example, has this definition: "to glance at random through a book, magazine, etc.". Hjørland suggests, however, that browsing is an activity that is governed by our metatheories. We may dynamically change our theories and conceptions but when we browse, the activity is governed by the interests, conceptions, priorities and metatheories that we have at that time. Therefore, browsing is not totally random. == Browsing versus analytical search strategies == In 1997, Gary Marchionini wrote: "A fundamental distinction is made between analytical and browsing strategies [...]. Analytical strategies depend on careful planning, the recall of query terms, and iterative query reformulations and examinations of results. Browsing strategies are heuristic and opportunistic and depend on recognizing relevant information. Analytic strategies are batch oriented and half duplex (turn talking) like human conversation, whereas browsing strategies are more interactive, real-time exchanges and collaborations between the information seeker and the information system. Browsing strategies demand a lower cognitive load in advance and a steadier attentional load throughout the information-seeking process. When it comes to Browsing, giblets are amazing." == Orienting strategies == Some sociologists, such as Berger and Zelditch in 1993, Wagner in 1984, and Wagner & Berger in 1985, have used the term "orienting strategies". They find that orienting strategies should be understood as metatheories: "Consider the very large proportion of sociological theory that is in the form of metatheory. It is discussion about theory: about what concepts it should include, about how those concepts should be linked, and about how theory should be studied. Similar to Kuhn’s paradigms, theories of this sort provide guidelines or strategies for understanding social phenomena and suggest the proper orientation of the theorist to these phenomena; they are orienting strategies. Textbooks in theory frequently focus on orienting strategies such as functionalism, exchange, or ethnomethodology." Sociologists thus use metatheories as orienting strategies. We may generalize and say that all people use metatheories as orienting strategies and that this is what direct our attention and also our browsing – also when we are not conscious about it.

    Read more →
  • Adrozek

    Adrozek

    Adrozek is malware that injects fake ads into online search results. Microsoft announced the malware threat on 10 December 2020, and noted that many different browsers are affected, including Google Chrome, Microsoft Edge, Mozilla Firefox and Yandex Browser. The malware was first detected in May 2020 and, at its peak in August 2020, controlled over 30,000 devices a day. But during the December 2020 announcement, Microsoft claimed "hundreds of thousands" of infected devices worldwide between May and September 2020. According to Microsoft, if not detected and blocked, Adrozek adds browser extensions, modifies a specific DLL per target browser, and changes browser settings to insert additional, unauthorized ads into web pages, often on top of legitimate ads from search engines. For each user tricked into clicking on the fake ads, the scammers earn affiliate advertising dollars. The malware has been observed to extract device data and, in some cases, steal credentials, sending them to remote servers. Users may unintentionally install the malware because of a drive-by download, by visiting a tampered website, opening an e-mail attachment, or clicking on a deceptive link or a deceptive pop-up window. The main malware program is downloaded to the “Programs Files” folder using file names such as Audiolava.exe, QuickAudio.exe, and converter.exe. According to PC Magazine, a good way to avoid, or mitigate, infection by Adrozek is to keep browser and related software programs up to date.

    Read more →
  • Applications of artificial intelligence

    Applications of artificial intelligence

    Artificial intelligence is the capability of computational systems to perform tasks that are typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. Artificial intelligence has been used in applications throughout industry and academia. Within the field of Artificial Intelligence, there are multiple subfields. The subfield of machine learning has been used for various scientific and commercial purposes, including language translation, image recognition, decision-making, credit scoring, and e-commerce. In recent years, massive advancements have been made in the field of generative artificial intelligence, which uses generative models to generate text, images, videos, and other forms of data. This article describes applications of AI in different sectors. == Agriculture == In agriculture, AI has been proposed as a way for farmers to identify areas that need irrigation, fertilization, or pesticide treatments to increase yields, thereby improving efficiency. AI has been used to attempt to classify livestock pig call emotions, automate greenhouses, detect diseases and pests, and optimize irrigation. == AI-assisted software develoment == == Architecture and design == == Business == A 2023 study found that generative AI increased productivity by 15% in contact centers. Another 2023 study found it increased productivity by up to 40% in writing tasks. An August 2025 review by MIT found that of surveyed companies, 95% did not report any improvement in revenue from the use of AI. A September 2025 article by the Harvard Business Review describes how increased use of AI does not automatically lead to increases in revenue or actual productivity. Referring to "AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task" the article coins the term workslop. Per studies done in collaboration with the Stanford Social Media Lab, workslop does not improve productivity and undermines trust and collaboration among colleagues. In telehealth, agentic AI is reportedly facilitating the creation of large business models (millions in annual profit) with 1-2 employees, such as MEDVi, which as of August 2025 only had 2 employees and ~$75M in annual profit for GLP-1 weight-loss telehealth services. == Chatbots == == Computer science == === Programming assistance === ==== AI-assisted software development ==== AI can be used for real-time code completion, chat, and automated test generation. These tools are typically integrated with editors and IDEs as plugins. AI-assisted software development systems differ in functionality, quality, speed, and approach to privacy. Creating software primarily via AI is known as "vibe coding". Code created or suggested by AI can be incorrect or inefficient. The use of AI-assisted coding can potentially speed-up software development, but can also slow-down the process by creating more work when debugging and testing. The rush to prematurely adopt AI technology can also incur additional technical debt. AI also requires additional consideration and careful review for cybersecurity, since AI coding software is trained on a wide range of code of inconsistent quality and often replicates poor practices. ==== Neural network design ==== AI can be used to create other AIs. For example, around November 2017, Google's AutoML project to evolve new neural net topologies created NASNet, a system optimized for ImageNet and POCO F1. NASNet's performance exceeded all previously published performance on ImageNet. ==== Quantum computing ==== Research and development of quantum computers has been performed with machine learning algorithms. For example, there is a prototype, photonic, quantum memristive device for neuromorphic computers (NC)/artificial neural networks and NC-using quantum materials with some variety of potential neuromorphic computing-related applications. The use of quantum machine learning for quantum simulators has been proposed for solving physics and chemistry problems. === Historical contributions === AI researchers have created many tools to solve the most difficult problems in computer science. Many of their inventions have been adopted by mainstream computer science and are no longer considered AI. All of the following were originally developed in AI laboratories: Time sharing Interactive interpreters Graphical user interfaces and the computer mouse Rapid application development environments The linked list data structure Automatic storage management Symbolic programming Functional programming Dynamic programming Object-oriented programming Optical character recognition Constraint satisfaction == Customer service == === Human resources === AI programs have been used in hiring processes to screen resumes and rank candidates based on their qualifications, predict a candidate's likelihood of success in a given role, and automate repetitive communication tasks using chatbots. Studies on these programs have identified tendencies for gender bias, favoring male names and male-coded characteristics, as well as bias against disabled candidates and racial minorities. === Online and telephone customer service === AI underlies avatars (automated online assistants) on web pages. It can reduce operation and training costs. Pypestream automated customer service for its mobile application to streamline communication with customers. A Google app analyzes language and converts speech into text. The platform can identify angry customers through their language and respond appropriately. Amazon uses a chatbot for customer service that can perform tasks like checking the status of an order, cancelling orders, offering refunds and connecting the customer with a human representative. Generative AI (GenAI), such as ChatGPT, is increasingly used in business to automate tasks and enhance decision-making. === Hospitality === In the hospitality industry, AI is used to reduce repetitive tasks, analyze trends, interact with guests, and predict customer needs. AI hotel services come in the form of a chatbot, application, virtual voice assistant and service robots. == Education == In educational institutions, AI has been used to automate routine tasks such as attendance tracking, grading, and marking. AI tools have also been used to monitor student progress and analyze learning behaviors, with the goal of facilitating timely interventions for students facing academic challenges. == Energy and environment == === Energy system === The U.S. Department of Energy wrote in an April 2024 report that AI may have applications in modeling power grids, reviewing federal permits with large language models, predicting levels of renewable energy production, and improving the planning process for electrical vehicle charging networks. Other studies have suggested that machine learning can be used for energy consumption prediction and scheduling, e.g. to help with renewable energy intermittency management (see also: smart grid and climate change mitigation in the power grid). === Environmental monitoring === Autonomous ships that monitor the ocean, AI-driven satellite data analysis, passive acoustics or remote sensing and other applications of environmental monitoring make use of machine learning. For example, "Global Plastic Watch" is an AI-based satellite monitoring-platform for analysis/tracking of plastic waste sites to help prevention of plastic pollution – primarily ocean pollution – by helping identify who and where mismanages plastic waste, dumping it into oceans. === Early-warning systems === Machine learning can be used to spot early-warning signs of disasters and environmental issues, possibly including natural pandemics, earthquakes, landslides, heavy rainfall, long-term water supply vulnerability, tipping-points of ecosystem collapse, cyanobacterial bloom outbreaks, and droughts. === Economic and social challenges === The University of Southern California launched the Center for Artificial Intelligence in Society, with the goal of using AI to address problems such as homelessness. Stanford researchers use AI to analyze satellite images to identify high poverty areas. == Entertainment and media == === Media === AI applications analyze media content such as movies, TV programs, advertisement videos or user-generated content. The solutions often involve computer vision. Typical scenarios include the analysis of images using object recognition or face recognition techniques, or the analysis of video for scene recognizing scenes, objects or faces. AI-based media analysis can facilitate media search, the creation of descriptive keywords for content, content policy monitoring (such as verifying the suitability of content for a particular TV viewing time), speech to text for archival or other purposes, and the detection of logos, products or celebrity faces for ad placement. Motion interpolation Pixel-art scaling algorithms Image scaling Imag

    Read more →
  • Artificial Intelligence Applications Institute

    Artificial Intelligence Applications Institute

    The Artificial Intelligence Applications Institute (AIAI) at the School of Informatics at the University of Edinburgh is a non-profit technology transfer organisation that promoted research in the field of artificial intelligence. == History == The Artificial Intelligence Applications Institute (AIAI) was founded in 1983 at the University of Edinburgh as a specialist research and technology-transfer unit focusing on the practical uses of artificial intelligence (AI). The institute was established by Professor Jim Howe and colleagues from the Science and Engineering Research Council (SERC) Special Interest Group in AI in the Department of Artificial Intelligence, with a mission to apply AI techniques to solve real-world industrial and governmental problems. Under the directorship of Austin Tate, who served from 1985 to 2019, AIAI became one of the leading UK research centres devoted to AI programming systems, intelligent planning systems, decision support, and knowledge-based engineering. It collaborated with both academic partners and international organisations such as the European Space Agency and the UK Ministry of Defence. In 2001, AIAI joined the newly created Centre for Intelligent Systems and their Applications (CISA) within the University's School of Informatics. In December 2019, the institute was renamed the Artificial Intelligence and its Applications Institute to reflect a broader integration of fundamental and applied AI research. == Research programmes == AIAI’s research spans multiple areas of artificial intelligence, including: AI programming Systems - Edinburgh Prolog, Edinburgh Common Lisp, Logo; Knowledge representation and reasoning – development of ontologies, rule-based inference, and semantic modelling; Automated planning and scheduling – intelligent task management systems used in aerospace, manufacturing, and emergency response; Natural language processing and intelligent agents – interaction frameworks for human–computer collaboration; AI ethics and decision-making – research into responsible deployment and evaluation of autonomous systems. The institute also contributes to interdisciplinary fields such as computational creativity, explainable AI, and human–AI interaction. AIAI maintains close collaboration with the Bayes Centre and the Alan Turing Institute through joint research programmes and doctoral training initiatives. == Technology transfer and impact == From its inception, AIAI has combined academic research with technology-transfer activity, offering professional training, industrial consultancy, and bespoke software systems. It pioneered one of the earliest knowledge-based project-management systems, O-Plan, later evolved into the I-Plan framework used for autonomous planning and workflow management.

    Read more →
  • Information logistics

    Information logistics

    Information Logistics (IL) deals with the flow of information between human or machine actors within or between any number of organizations that in turn form a value creating network (see, e.g.). IL is closely related to information management, information operations and information technology. == Definition == The term Information Logistics (IL) may be used in either of two ways: Firstly, it can be defined as "managing and controlling information handling processes optimally with respect to time (flow time and capacity), storage, distribution and presentation in such a way that it contributes to company results in concurrence with the costs of capturing (creation, searching, maintenance etc)." (Petri,2017) Thus IL utilizes logistic principles to optimize information handling. Secondly, IL can be seen as a concept using information technology to optimize logistics. A term which is closely related to the first meaning of Information Logistics is Data Logistics, a concept used in Computer Networking. "The study of solutions to problems in Computer Systems that flexibly span resources and services relating to Data Movement, Data Storage and Data Processing." [ref?] Systems that support general Data Logistics solutions thus must span the traditionally separate fields of Networking, File/Database Systems and Process Management. Data Logistics is a more general form of the term Logistical Networking, used as the name of a particular network storage architecture and software stack. == Goal == The goal of Information Logistics is to deliver the right product, consisting of the right information element, in the right format, at the right place at the right time for the right people at the right price and all of this is customer demand driven. If this goal is to be achieved, knowledge workers are best equipped with information for the task at hand for improved interaction with its customers and machines are enabled to respond automatically to meaningful information. Methods for achieving the goal are: the analysis of information demand intelligent information storage the optimization of the flow of information maintaining both security and organizational flexibility integrated information and billing solutions The expression was formed by the Indian mathematician and librarian S. R. Ranganathan . The supply of a product is part of the discipline Logistics. The purpose of this discipline is described as follows: Logistics is the teachings of the plans and the effective and efficient run of supply. The contemporary logistics focuses on the organization, planning, control and implementation of the flow of goods, money, information and people. Information Logistics focusses on information. Information (from Latin informare: "shape, shapes, instruct") means in a general sense everything that adds knowledge and thus reduce ignorance or lack of precision. In a stricter sense, raw data only becomes information to those who can interpret it. Interpreting relevant, related information produces insight that either leads to existing, or eventually builds new, knowledge. == Information element == An information element (IE) is an information component that is located in the organizational value chain. The combination of certain IEs leads to an information product (IP), which is any final product in the form of information that a person needs to have. When a higher number of different IEs are required, it often results in more planning problems in capacity and inherently leads to a non-delivery of the IP. To illustrate the concept of an IP, an example is shown of a bottleneck analysis in HR (by J. Willems 2008). Here, the illustration shows how the information elements (e.g. qualifications) build up the information product (e.g. HR file). == Data logistics == Data logistics is a concept that developed independently of information logistics in the 1990s, in response to the explosion of Internet content and traffic due to the invention of the World Wide Web (WWW). Some motivations for the emergence of interest in Data Logistics included: The incorporation of network hyperlinks into content encoded in HTML encouraged users to freely dereference those links without regard to, or in many cases without even having any knowledge of, the identity (much less the geographical or network topological location of) the target Web server. The growth in the volume of Web hits, combined with the steady increase in the size of Web-delivered objects such as images, audio and video clips resulted in the localized overloading of the bandwidth and processing resources of the local and/or wide area network and/or the Web server infrastructure. The resulting Internet bottleneck can cause Web clients to experience poor performance or complete denial of access to servers that host high volume sites (the so-called Slashdot effect). The growth in all Internet traffic, especially across international telecommunication links, resulted in stress to institutional infrastructure and high costs on networks that billed Internet traffic on a per-use basis. Much of this traffic was redundant, the results of repeated requests by many independent users to access the same stored files and content. Large files and content retrieved from distant Web servers was often delayed due to high delays experienced over long and complex Internet paths. These factors led to interest in the use of large scale storage (and to a lesser extent, processing) resources to cache the response to network requests, first at the Internet endpoint using a Web browser cache and later at intermediate network locations using shared network caches. This line of development also gave rise to Web server replication and other techniques for offloading and distributing the work of delivering large volume Web services to widely dispersed client communities, ultimately resulting in the creation of modern Content delivery networks. At the same time, research efforts in server replication and content delivery gave rise to a number of related projects and strategies, including Logistical Networking (LN). The name LN was intended as an analogy to physical supply chain logistics, in which goods are not only carried from source to destination on networks of roads, but are also stored at warehouses located throughout the transportation infrastructure. This led to a nomenclature in which LN network storage resources are termed "storage depots". The principles that underpin LN have been abstracted into the more general study of scheduling and optimization across the traditional infrastructure silos of Storage, Networking and Processing which was named Data Logistics. === Illustrative examples of data logistics === Data Caching and Replication are classic examples of Data Logistics solutions to problems in Computer Systems and Networking with high data access latencies or data transfer resource limitations. It works mainly across the areas of data transfer and data storage. Dynamic Compression in data transfer is another example which uses computational resources to minimize the bandwidth requirements of data transfer.

    Read more →
  • Ibotta

    Ibotta

    Ibotta, Inc. is an American mobile technology company headquartered in Denver, Colorado. Founded in 2011, the company offers cash back rewards on various purchases through its Ibotta Performance Network and direct to consumer app. Ibotta partners with CPG (consumer packaged goods) brands and network publishers to provide these rewards. As of 2024, the company operates solely in the United States. The company's rewards-as-a-service offering, the Ibotta Performance Network, went live in 2022. In August 2019, Ibotta received a $1 billion valuation after its Series D funding, and in 2023, the company surpassed $1.5 billion cash rewards paid to over 50 million consumers since the company's founding. Ibotta became a publicly traded company in April 2024 with a listing on the New York Stock Exchange. As of September 2025, Ibotta is trading at approximately $27.13 per share, marking a 69% decline from its initial public offering price of $88 per share on April 18, 2024. == History == === Founding through early 2019 === Ibotta was founded by current CEO Bryan Leach. The company was incorporated in 2011 and the app launched to both the App Store and Google Play stores in 2012. Early investors included entrepreneur and computer scientist Jim Clark and Tom “TJ” Jermoluk, Chairman of @Home Network. In 2015, Ibotta expanded beyond item level grocery, adding the ability to get cash back on in-store retail purchases. In 2016, in-app mobile commerce began, allowing users to navigate from the Ibotta app to its partners' apps to earn cash back on purchases. In 2016 with a Series C investment, Ibotta had raised over $73 million in funding. In March of that year, Ibotta partnered with Anheuser-Busch to offer cash back for adults who purchased its products. In May, the company partnered with LiveRamp so that companies could use their CRM data to create segmented, personalized campaigns. At the time, the company had around 200 full- and part-time employees and moved from offices in Lower Downtown Denver (LoDo) to a 40,000-square-foot office in the central Denver business district. A year later, the company had to expand to a second floor as it added almost another 100 employees. In 2017, Ibotta added cash back for Uber to its app as well as cash back rewards for online and mobile purchases. In 2018, Ibotta was listed on the Inc. 5,000 list as one of the fastest growing private companies in the U.S. A year later, in January 2019, the Ibotta app had been downloaded more than 30 million times with users receiving a reported $500 million in cash back rewards. That year, Ibotta was the largest mobile company in Colorado with six million monthly active users. === August 2019 to present === In August 2019, Ibotta was valued at $1 billion, following a Series D round of funding. The round was led by Koch Disruptive Technologies, a subsidiary of Koch Industries. 2019 was also the year the company introduced Pay with Ibotta, which allowed users to complete purchases at key retailers on the Ibotta app and earn instant cash back in the process. With that new service, users were able to enter their purchase total and use a QR code to checkout and receive immediate cash back. In 2020, the company partnered with Trees for the Future to plant up to 1 million trees as part of an Earth Month campaign to raise awareness about the waste of unused paper coupons. In response to the COVID-19 pandemic, Ibotta partnered with CPG brands in their “Here to Help” campaign and together committed over $10 million in cash back to American consumers. The company added the ability to earn cash back from online grocery pick-up and delivery orders. Later that year, Ibotta started its free Thanksgiving program, providing users with 100% cash back on select groceries needed for a Thanksgiving meal. By 2022, the company had provided approximately 10 million Thanksgiving meals. In 2021, Ibotta acquired the company OctoShop (originally InStok), a shopping browser extension company. The OctoShop app enables users to compare prices across stores and set restock and price-drop alerts. In April 2022, the Ibotta Performance Network (IPN) was launched. The IPN allows brands to deliver digital offers to consumers through third party publishers. Retailers including Walmart, Dollar General and Family Dollar, food delivery services including Instacart, and convenience stores including Shell are all part of the Ibotta Performance Network. This pay-per-sales or success-based performance network reaches over 200 million consumers. On April 18, 2024, Ibotta had its initial public offering (IPO), trading on the New York Stock Exchange (NYSE) under the ticker symbol IBTA. It was the largest technology IPO in Colorado history. In October 2025, Ibotta announced a partnership with technology and analytics company Circana, integrating Circana's Household Lift measurement into Ibotta campaigns to give CPG brands an increased understanding of the impact of their promotional campaigns. On November 3, 2025, Ibotta launched LiveLift, a tool for companies to measure the return on investment of digital promotions, in order to optimize performance marketing goals. === Athletic partnerships === Ibotta became the official jersey patch partner of the New Orleans Pelicans, a professional men's basketball team in the National Basketball Association (NBA), for the 2020–2021 and 2023–2024 seasons. Ibotta became the official jersey patch partner of the 2023 NBA champion Denver Nuggets baskeetball team beginning in the 2023–2024 season. In March 2023, F1 driver Logan Sargeant, the first U.S. racer to compete in F1 since 2015, partnered with Ibotta. The Ibotta logo was displayed on Sargeant's racing helmet throughout his F1 career. In June 2023, UConn Huskies women's basketball player Paige Bueckers entered into a "name, image, and likeness" (NIL) promotional agreement with Ibotta. According to a press release by Ibotta, the company has agreements with The Brandr Group, which finds NIL opportunities for women college athletes, and the Pearpop social media marketing platform to promote Ibotta. == Legal issues == In April 2025, shareholders filed a class action lawsuit—Fortune v. Ibotta, Inc., in the U.S. District Court for the District of Colorado (Case No. 25-cv-01213)—alleging that the registration statement in connection with Ibotta’s April 2024 initial public offering omitted material information. The complaint claims that, although Ibotta disclosed detailed terms for its contract with Walmart Inc., it failed to warn investors that its agreement with The Kroger Co., its second-largest client, was terminable at will and thus could be canceled without warning, creating a misleading impression of stability.

    Read more →
  • Data quality

    Data quality

    Data quality refers to the state of qualitative or quantitative pieces of information. There are many definitions of data quality, but data is generally considered high quality if it is "fit for [its] intended uses in operations, decision making and planning". Data is deemed of high quality if it correctly represents the real-world construct to which it refers. Apart from these definitions, as the number of data sources increases, the question of internal data consistency becomes significant, regardless of fitness for use for any particular external purpose. People's views on data quality can often be in disagreement, even when discussing the same set of data used for the same purpose. When this is the case, businesses may adopt recognised international standards for data quality (See #International Standards for Data Quality below). Data governance can also be used to form agreed upon definitions and standards, including international standards, for data quality. In such cases, data cleansing, including standardization, may be required in order to ensure data quality. == Definitions == Defining data quality is difficult due to the many contexts data are used in, as well as the varying perspectives among end users, producers, and custodians of data. From a consumer perspective, data quality is: "data that are fit for use by data consumers" data "meeting or exceeding consumer expectations" data that "satisfies the requirements of its intended use" From a business perspective, data quality is: data that are "'fit for use' in their intended operational, decision-making and other roles" or that exhibits "'conformance to standards' that have been set, so that fitness for use is achieved" data that "are fit for their intended uses in operations, decision making and planning" "the capability of data to satisfy the stated business, system, and technical requirements of an enterprise" From a standards-based perspective, data quality is: the "degree to which a set of inherent characteristics (quality dimensions) of an object (data) fulfills requirements" "the usefulness, accuracy, and correctness of data for its application" Arguably, in all these cases, "data quality" is a comparison of the actual state of a particular set of data to a desired state, with the desired state being typically referred to as "fit for use," "to specification," "meeting consumer expectations," "free of defect," or "meeting requirements." These expectations, specifications, and requirements are usually defined by one or more individuals or groups, standards organizations, laws and regulations, business policies, or software development policies. == Dimensions of data quality == Drilling down further, those expectations, specifications, and requirements are stated in terms of characteristics or dimensions of the data, such as: accessibility or availability accuracy or correctness comparability completeness or comprehensiveness consistency, coherence, or clarity credibility, reliability, or reputation flexibility plausibility relevance, pertinence, or usefulness timeliness or latency uniqueness validity or reasonableness A systematic scoping review of the literature suggests that data quality dimensions and methods with real world data are not consistent in the literature, and as a result quality assessments are challenging due to the complex and heterogeneous nature of these data. == International standards for data quality == ISO 8000 is an international standard for data quality. Managed by the International Organization for Standardization, the ISO 8000 standards address and describe general aspects of data quality including principles, vocabulary and measurement data governance data quality management data quality assessment quality of master data, including exchange of characteristic data and identifiers quality of industrial data == History == Before the rise of the inexpensive computer data storage, massive mainframe computers were used to maintain name and address data for delivery services. This was so that mail could be properly routed to its destination. The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events. Government agencies began to make postal data available to a few service companies to cross-reference customer data with the National Change of Address registry (NCOA). This technology saved large companies millions of dollars in comparison to manual correction of customer data. Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately. Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available. Companies with an emphasis on marketing often focused their quality efforts on name and address information, but data quality is recognized as an important property of all types of data. Principles of data quality can be applied to supply chain data, transactional data, and nearly every other category of data found. For example, making supply chain data conform to a certain standard has value to an organization by: 1) avoiding overstocking of similar but slightly different stock; 2) avoiding false stock-out; 3) improving the understanding of vendor purchases to negotiate volume discounts; and 4) avoiding logistics costs in stocking and shipping parts across a large organization. For companies with significant research efforts, data quality can include developing protocols for research methods, reducing measurement error, bounds checking of data, cross tabulation, modeling and outlier detection, verifying data integrity, etc. == Overview == There are a number of theoretical frameworks for understanding data quality. A systems-theoretical approach influenced by American pragmatism expands the definition of data quality to include information quality, and emphasizes the inclusiveness of the fundamental dimensions of accuracy and precision on the basis of the theory of science (Ivanov, 1972). One framework, dubbed "Zero Defect Data" (Hansen, 1991) adapts the principles of statistical process control to data quality. Another framework seeks to integrate the product perspective (conformance to specifications) and the service perspective (meeting consumers' expectations) (Kahn et al. 2002). Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004). One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996). A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al., 1993). Software engineers may recognize this as a similar problem to "ilities". MIT has an Information Quality (MITIQ) Program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality, ICIQ). This program grew out of the work done by Hansen on the "Zero Defect Data" framework (Hansen, 1991). In practice, data quality is a concern for professionals involved with a wide range of information systems, ranging from data warehousing and business intelligence to customer relationship management and supply chain management. One industry study estimated the total cost to the U.S. economy of data quality problems at over U.S. $600 billion per annum (Eckerson, 2002). Incorrect data – which includes invalid and outdated information – can originate from different data sources – through data entry, or data migration and conversion projects. In 2002, the USPS and PricewaterhouseCoopers released a report stating that 23.6 percent of all U.S. mail sent is incorrectly addressed. One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year. In fact, the problem is such a concern that companies are beginning to set up a data governance team whose sole role in the corporation is to be responsible for data quality. In some organizations, this data governance function has been established as part of a larger Regulatory Compliance function - a recognition of the importance of Data/Information Quality to organizations. Problems with data quality don't only arise from incorrect data; inconsistent data is a problem as well. Eliminating data shadow systems and centralizing data in a warehouse is one of the initiatives a company can take to ensure data consistency. En

    Read more →
  • In-place algorithm

    In-place algorithm

    In computer science, an in-place algorithm is an algorithm that operates directly on the input data structure without requiring extra space proportional to the input size. In other words, it modifies the input in place, without creating a separate copy of the data structure. An algorithm which is not in-place is sometimes called not-in-place or out-of-place. In-place can have slightly different meanings. In its strictest form, the algorithm can only have a constant amount of extra space, counting everything including function calls and pointers. However, this form is very limited as simply having an index to a length n array requires O(log n) bits. More broadly, in-place means that the algorithm does not use extra space for manipulating the input but may require a small though non-constant extra space for its operation. Usually, this space is O(log n), though sometimes anything in o(n) is allowed. Note that space complexity also has varied choices in whether or not to count the index lengths as part of the space used. Often, the space complexity is given in terms of the number of indices or pointers needed, ignoring their length. In this article, we refer to total space complexity (DSPACE), counting pointer lengths. Therefore, the space requirements here have an extra log n factor compared to an analysis that ignores the lengths of indices and pointers. An algorithm may or may not count the output as part of its space usage. Since in-place algorithms usually overwrite their input with output, no additional space is needed. When writing the output to write-only memory or a stream, it may be more appropriate to only consider the working space of the algorithm. In theoretical applications such as log-space reductions, it is more typical to always ignore output space (in these cases it is more essential that the output is write-only). == Examples == Given an array a of n items, suppose we want an array that holds the same elements in reversed order and to dispose of the original. One seemingly simple way to do this is to create a new array of equal size, fill it with copies from a in the appropriate order and then delete a. function reverse(a[0..n - 1]) allocate b[0..n - 1] for i from 0 to n - 1 b[n − 1 − i] := a[i] return b Unfortunately, this requires O(n) extra space for having the arrays a and b available simultaneously. Also, allocation and deallocation are often slow operations. Since we no longer need a, we can instead overwrite it with its own reversal using this in-place algorithm which will only need constant number (2) of integers for the auxiliary variables i and tmp, no matter how large the array is. function reverse_in_place(a[0..n-1]) for i from 0 to floor((n-2)/2) tmp := a[i] a[i] := a[n − 1 − i] a[n − 1 − i] := tmp As another example, many sorting algorithms rearrange arrays into sorted order in-place, including: bubble sort, comb sort, selection sort, insertion sort, heapsort, and Shell sort. These algorithms require only a few pointers, so their space complexity is O(log n). Quicksort operates in-place on the data to be sorted. However, quicksort requires O(log n) stack space pointers to keep track of the subarrays in its divide and conquer strategy. Consequently, quicksort needs O(log2 n) additional space. Although this non-constant space technically takes quicksort out of the in-place category, quicksort and other algorithms needing only O(log n) additional pointers are usually considered in-place algorithms. Most selection algorithms are also in-place, although some considerably rearrange the input array in the process of finding the final, constant-sized result. Some text manipulation algorithms such as trim and reverse may be done in-place. == In computational complexity == In computational complexity theory, the strict definition of in-place algorithms includes all algorithms with O(1) space complexity, the class DSPACE(1). This class is very limited; it equals the regular languages. In fact, it does not even include any of the examples listed above. Algorithms are usually considered in L, the class of problems requiring O(log n) additional space, to be in-place. This class is more in line with the practical definition, as it allows numbers of size n as pointers or indices. This expanded definition still excludes quicksort, however, because of its recursive calls. Identifying the in-place algorithms with L has some interesting implications; for example, it means that there is a (rather complex) in-place algorithm to determine whether a path exists between two nodes in an undirected graph, a problem that requires O(n) extra space using typical algorithms such as depth-first search (a visited bit for each node). This in turn yields in-place algorithms for problems such as determining if a graph is bipartite or testing whether two graphs have the same number of connected components. == Role of randomness == In many cases, the space requirements of an algorithm can be drastically cut by using a randomized algorithm. For example, if one wishes to know if two vertices in a graph of n vertices are in the same connected component of the graph, there is no known simple, deterministic, in-place algorithm to determine this. However, if we simply start at one vertex and perform a random walk of about 20n3 steps, the chance that we will stumble across the other vertex provided that it is in the same component is very high. Similarly, there are simple randomized in-place algorithms for primality testing such as the Miller–Rabin primality test, and there are also simple in-place randomized factoring algorithms such as Pollard's rho algorithm. == In functional programming == Functional programming languages often discourage or do not support explicit in-place algorithms that overwrite data, since this is a type of side effect; instead, they only allow new data to be constructed. However, good functional language compilers will often recognize when an object very similar to an existing one is created and then the old one is thrown away, and will optimize this into a simple mutation "under the hood". Note that it is possible in principle to carefully construct in-place algorithms that do not modify data (unless the data is no longer being used), but this is rarely done in practice.

    Read more →
  • Causal AI

    Causal AI

    Causal AI is a technique in artificial intelligence that builds a causal model and can thereby make inferences using causality rather than just correlation. One practical use for causal AI is for organisations to explain decision-making and the causes for a decision. Systems based on causal AI, by identifying the underlying web of causality for a behaviour or event, provide insights that solely predictive AI models might fail to extract from historical data. An analysis of causality may be used to supplement human decisions in situations where understanding the causes behind an outcome is necessary, such as quantifying the impact of different interventions, policy decisions or performing scenario planning. A 2024 paper from Google DeepMind demonstrated mathematically that "Any agent capable of adapting to a sufficiently large set of distributional shifts must have learned a causal model". The paper offers the interpretation that learning to generalise beyond the original training set requires learning a causal model, concluding that causal AI is necessary for artificial general intelligence. == History == The concept of causal AI and the limits of machine learning were raised by Judea Pearl, the Turing Award-winning computer scientist and philosopher, in 2018's The Book of Why: The New Science of Cause and Effect. Pearl asserted: “Machines' lack of understanding of causal relations is perhaps the biggest roadblock to giving them human-level intelligence.” In 2020, Columbia University established a Causal AI Lab under Director Elias Bareinboim. Professor Bareinboim's research focuses on causal and counterfactual inference and their applications to data-driven fields in the health and social sciences as well as artificial intelligence and machine learning. Technological research and consulting firm Gartner for the first time included causal AI in its 2022 Hype Cycle report, citing it as one of five critical technologies in accelerated AI automation. Causal AI is closely related to but distinct from fields such as causal inference, explainable AI and causal reasoning. While causal inference focuses on estimating cause-effect relationships (often from observational data), causal AI emphasises the integration of those causal models into AI systems for prediction, planning and adaptation.

    Read more →
  • Key–value database

    Key–value database

    A key-value database, or key-value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database. Key-value databases differ from the better known relational databases (RDB). RDBs pre-define the data structure in the database as a series of tables containing fields with well-defined data types. Exposing the data types to the database program allows it to apply various optimizations. In contrast, key-value systems treat the value as opaque to the database itself, and typically support only simple operations such as storing, retrieving, updating, and deleting a value by its key. This offers considerable flexibility and makes such systems well suited to low-latency, high-throughput workloads dominated by direct key lookups, but less suitable for applications that require complex queries or explicit relationships among records. A lack of standardization, limited transaction support, and relatively simple query interfaces long restricted many key-value systems to specialized uses, but the rapid move to cloud computing after 2010 helped drive renewed interest in them as part of the broader NoSQL movement. Some graph databases, such as ArangoDB, are also key–value databases internally, adding the concept of relationships (pointers) between records as a first-class data type. == Types and examples == Key–value systems span a wide consistency spectrum, from eventually consistent designs to strongly consistent or serializable ones, and some allow the consistency level to be configured as part of the trade-off against latency and availability. Renewed interest in key–value and other NoSQL systems was driven in part by the demands of big data, distributed, and cloud applications. Their scalability and availability made them attractive for cloud data management, although limited transaction support, low-level query interfaces, and the lack of standardization remained obstacles to wider adoption. Some maintain data in memory (RAM), while others employ solid-state drives or rotating disks. Some key–value systems add additional structure to their keys. For example, Oracle NoSQL Database organizes records using composite keys with "major" and "minor" components, an arrangement that Oracle compares to a directory-path structure in a file system. More generally, however, key–value stores are defined by their use of unique keys associated with opaque values and by their emphasis on simple key-based operations. Unix included dbm (database manager), a minimal database library written by Ken Thompson for managing associative arrays with a single key and hash-based access. Later implementations and related libraries included sdbm, GNU dbm (gdbm), and Berkeley DB. A more recent example is RocksDB, a persistent key–value storage engine developed at Facebook and designed for large-scale applications. Other examples include in-memory systems such as Memcached and Redis, and persistent systems such as Berkeley DB, Riak, and Voldemort.

    Read more →
  • Metadata management

    Metadata management

    Metadata management involves managing metadata about other data, whereby this "other data" is generally referred to as content data. The term is used most often in relation to digital media, but older forms of metadata are catalogs, dictionaries, and taxonomies. For example, the Dewey Decimal Classification is a metadata management system developed in 1876 for libraries. == Metadata schema == Metadata management goes by the end-to-end process and governance framework for creating, controlling, enhancing, attributing, defining and managing a metadata schema, model or other structured aggregation system, either independently or within a repository and the associated supporting processes (often to enable the management of content). For web-based systems, URLs, images, video etc. may be referenced from a triples table of object, attribute and value. == Scope == With specific knowledge domains, the boundaries of the metadata for each must be managed, since a general ontology is not useful to experts in one field whose language is knowledge-domain specific. == Metadata Manager == In the process of developing a knowledge management solution, creating a metadata schema, and a system in which metadata is managed, a dedicated resource may be appointed to maintain adherence to metadata standards as defined by data owners as well as general best practice. This person is responsible for curation of the business and technical layers of the metadata schema, and commonly involved with strategy and implementation. A metadata manager is not required to master all aspects, or be involved with everything concerning the solution, but an understanding of as much of the process as possible to ensure a relevant schema is developed. == Metadata management over time == Managing the metadata in a knowledge management solution is an important step in a metadata strategy. It is part of the strategy to make sure that the metadata are complete, current and correct at any given time. Managing a metadata project is also about making sure that users of the system are aware of the possibilities allowed by a well-designed metadata system and how to maximize the benefits of metadata. Regularly monitoring the metadata to ensure that the schema remains relevant is advised. === Wikipedia metadata === Wikipedia is a project that actively manages metadata for its articles and files. For example, volunteer editors carefully curate new biographical articles based on the notability (claim to fame), name, birth, and/or death dates. Similarly, volunteer editors carefully curate new architectural articles based on name, municipality, or geo coordinates. When new articles with a valid alternate spelling are added to Wikipedia that match up to existing articles based on metadata, these are then manually checked and if needed, tagged for merging. When new articles are added that are considered out of scope or otherwise unfit for Wikipedia, these are nominated for deletion. To help keep track of metadata on Wikipedia, the new Wikimedia project Wikidata was established in 2012. Click on the pictures to view more metadata about these images:

    Read more →
  • Data janitor

    Data janitor

    A data janitor is a person who works to take big data and condense it into useful amounts of information. Also known as a "data wrangler", a data janitor sifts through data for companies in the information technology industry. A multitude of start-ups rely on large amounts of data, so a data janitor works to help these businesses with this basic, but difficult process of interpreting data. While it is a commonly held belief that data janitor work is fully automated, many data scientists are employed primarily as data janitors. The information technology industry has been increasingly turning towards new sources of data gathered on consumers, so data janitors have become more commonplace in recent years.

    Read more →