AI Chat Character

AI Chat Character — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Python (programming language)

    Python (programming language)

    Python is a high-level, general-purpose programming language that emphasizes code readability, simplicity, and ease-of-writing with the use of significant indentation, "plain English" naming, an extensive ("batteries-included") standard library, and garbage collection. Python supports multiple programming paradigms but with an emphasis on object-oriented programming and dynamic typing. Guido van Rossum began working on Python in the late 1980s as a successor to the ABC programming language. Python 3.0, released in 2008, was a major revision and not completely backward-compatible with earlier versions. Beginning with Python 3.5, capabilities and keywords for typing were added to the language, allowing optional static typing. As of 2026, the Python Software Foundation supports Python 3.10, 3.11, 3.12, 3.13, and 3.14, following the project's annual release cycle and five-year support policy. Python 3.15 is currently in the alpha development phase, and the stable release is expected to launch in October 2026. Earlier versions in the 3.x series have reached end-of-life and no longer receive security updates. Python has gained extensive use in the machine learning community. It is widely taught as an introductory programming language. Since 2003, Python has consistently ranked among the top ten most popular programming languages in the TIOBE Programming Community Index, which ranks programming languages based on searches across 24 platforms. == History == Python was conceived in the late 1980s by Guido van Rossum at Centrum Wiskunde & Informatica (CWI) in the Netherlands. It was designed as a successor to the ABC programming language, which was inspired by SETL, capable of exception handling and interfacing with the Amoeba operating system. Python implementation began in December 1989. Van Rossum first released it in 1991 as Python 0.9.0. Van Rossum assumed sole responsibility for the project, as the lead developer, until 12 July 2018, when he announced his "permanent vacation" from responsibilities as Python's "benevolent dictator for life" (BDFL); this title was bestowed on him by the Python community to reflect his long-term commitment as the project's chief decision-maker. (He has since come out of retirement and is self-titled "BDFL-emeritus".) In January 2019, active Python core developers elected a five-member Steering Council to lead the project. The name Python derives from the British comedy series Monty Python's Flying Circus. (See § Naming.) Python 2.0 was released on 16 October 2000, featuring many new features such as list comprehensions, cycle-detecting garbage collection, reference counting, and Unicode support. Python 2.7's end-of-life was initially set for 2015, and then postponed to 2020 out of concern that a large body of existing code could not easily be forward-ported to Python 3. It no longer receives security patches or updates. While Python 2.7 and older versions are officially unsupported, a different unofficial Python implementation, PyPy, continues to support Python 2, i.e., "2.7.18+" (plus 3.11), with the plus signifying (at least some) "backported security updates". Python 3.0 was released on 3 December 2008, and was a major revision and not completely backward-compatible with earlier versions, with some new semantics and changed syntax. Python 2.7.18, released in 2020, was the last release of Python 2. Several releases in the Python 3.x series have added new syntax to the language, and made a few (considered very minor) backward-incompatible changes. As of May 2026, Python 3.14.5 is the latest stable release. All older 3.x versions had a security update down to Python 3.9.24 then again with 3.9.25, the final version in 3.9 series. Python 3.10 is, since November 2025, the oldest supported branch. Python 3.15 has an alpha released, and Android has an official downloadable executable available for Python 3.14. Releases receive two years of full support followed by three years of security support. == Design philosophy and features == Python is a multi-paradigm programming language. Object-oriented programming and structured programming are fully supported, and many of their features support functional programming and aspect-oriented programming – including metaprogramming and metaobjects. Many other paradigms are supported via extensions, including design by contract and logic programming. Python is often referred to as a 'glue language' because it is purposely designed to be able to integrate components written in other languages. Python uses dynamic typing and a combination of reference counting and a cycle-detecting garbage collector for memory management. It uses dynamic name resolution (late binding), which binds method and variable names during program execution. Python's design offers some support for functional programming in the "Lisp tradition". It has filter, map, and reduce functions; list comprehensions, dictionaries, sets, and generator expressions. The standard library has two modules (itertools and functools) that implement functional tools borrowed from Haskell and Standard ML. Python's core philosophy is summarized in the Zen of Python (PEP 20) written by Tim Peters, which includes aphorisms such as these: Explicit is better than implicit. Simple is better than complex. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity, errors should never pass silently, unless explicitly silenced. There should be one-- and preferably only one --obvious way to do it. However, Python has received criticism for violating these principles and adding unnecessary language bloat. Responses to these criticisms note that the Zen of Python is a guideline rather than a rule. The addition of some new features had been controversial: Guido van Rossum resigned as Benevolent Dictator for Life after conflict about adding the assignment expression operator in Python 3.8. Nevertheless, rather than building all functionality into its core, Python was designed to be highly extensible through modules. This compact modularity has made it particularly popular as a means of adding programmable interfaces to existing applications. Van Rossum's vision of a small core language with a large standard library and an easily extensible interpreter stemmed from his frustrations with ABC, which represented the opposite approach. Python claims to strive for a simpler, less-cluttered syntax and grammar, while giving developers a choice in their coding methodology. Python lacks do .. while loops, which Rossum considered harmful. In contrast to Perl's motto "there is more than one way to do it", Python advocates an approach where "there should be one – and preferably only one – obvious way to do it". In practice, however, Python provides many ways to achieve a given goal. There are at least three ways to format a string literal, with no certainty as to which one a programmer should use. Alex Martelli is a Fellow at the Python Software Foundation and Python book author; he wrote that "To describe something as 'clever' is not considered a compliment in the Python culture." Python's developers typically prioritize readability over performance. For example, they reject patches to non-critical parts of the CPython reference implementation that would offer increases in speed that do not justify the cost of clarity and readability. Execution speed can be improved by moving speed-critical functions to extension modules written in languages such as C, or by using a just-in-time compiler like PyPy. Also, it is possible to transpile to other languages. However, this approach either fails to achieve the expected speed-up, since Python is a very dynamic language, or only a restricted subset of Python is compiled (with potential minor semantic changes). Python is meant to be a fun language to use. This goal is reflected in the name – a tribute to the British comedy group Monty Python – and in playful approaches to some tutorials and reference materials. For instance, some code examples use the terms "spam" and "eggs" (in reference to a Monty Python sketch), rather than the typical terms "foo" and "bar". A common neologism in the Python community is pythonic, which has a broad range of meanings related to program style: Pythonic code may use Python idioms well; be natural or show fluency in the language; or conform with Python's minimalist philosophy and emphasis on readability. === Enhancement Proposals === Python Enhancement Proposals are a design document for either providing information to the Python community, or proposal for new feature in Python. PEPs are intented to explain new processes in Python, provide naming conventions or document the processes in the language. PEPs are overseen by Python Steering Council. There are 3 kinds of PEPs, with those are being standards track PEP, Informational PEP and Process PEPs which has their own unique meanings. They were firstly introduced in 2000, in

    Read more →
  • Enterprise bus matrix

    Enterprise bus matrix

    The enterprise bus matrix is a data warehouse planning tool and model created by Ralph Kimball, and is part of the data warehouse bus architecture. The matrix is the logical definition of one of the core concepts of Kimball's approach to dimensional modeling conformed dimension. The bus matrix defines part of the data warehouse bus architecture and is an output of the business requirements phase in the Kimball lifecycle. It is applied in the following phases of dimensional modeling and development of the data warehouse. The matrix can be categorized as a hybrid model, being part technical design tool, part project management tool and part communication tool == Background == The need for an enterprise bus matrix stems from the way one goes about creating the overall data warehouse environment. Historically there have been two approaches: a structured, centralized and planned approach and a more loosely defined, department specific approach, in which solutions are developed in a more independent matter. Autonomous projects can result in a range of isolated stove pipe data marts. Naturally each approach has its issues; the visionary approach often struggles with long delivery cycles and lack of reaction time as needs emerge and scope issues arise. On the other hand, the development of isolated data marts leads to stovepipe systems that lack synergy in development. Over time this approach will lead to a so-called data-mart-in-a-box architecture where interoperability and lack of cohesion is apparent, and can hinder the realization of an overall enterprise data warehouse. As an attempt to handle this issue, Ralph Kimball introduced the enterprise bus. == Description == The bus matrix purpose is one of high abstraction and visionary planning on the data warehouse architectural level. By dictating coherency in the development and implementation of an overall data warehouse the bus architecture approach enables an overall vision of the broader enterprise integration and consistency while at the same time dividing the problem into more manageable parts – all in a technology and software independent manner. The bus matrix and architecture builds upon the concept of conformed dimensions, creating a structure of common dimensions that ideally can be used across the enterprise by all business processes related to the data warehouse and the corresponding fact tables from which they derive their context. According to Kimball and Margy Ross's article “Differences of Opinion” "The Enterprise Data warehouse built on the bus architecture ”identifies and enforces the relationship between business process metrics (facts) and descriptive attributes (dimensions)”. The concept of a bus is well known in the language of information technology, and is what reflects the conformed dimension concept in the data warehouse, creating the skeletal structure where all parts of a system connect, ensuring interoperability and consistency of data, and at the same time considers future expansion. This makes the conformed dimensions act as the integration ‘glue’, creating a robust backbone of the enterprise Data Warehouse.

    Read more →
  • Hindley–Milner type system

    Hindley–Milner type system

    A Hindley–Milner (HM) type system is a classical type system for the lambda calculus with parametric polymorphism. It is also known as Damas–Milner or Damas–Hindley–Milner. It was first described by J. Roger Hindley and later rediscovered by Robin Milner. Luis Damas contributed a close formal analysis and proof of the method in his PhD thesis. Among HM's more notable properties are its completeness and its ability to infer the most general type of a given program without programmer-supplied type annotations or other hints. Algorithm W is an efficient type inference method in practice and has been successfully applied on large code bases, although it has a high theoretical complexity. HM is preferably used for functional programming languages. It was first implemented as part of the type system of the programming language ML. Since then, HM has been extended in various ways, most notably with type class constraints like those in Haskell. == Introduction == As a type inference method, Hindley–Milner is able to deduce the types of variables, expressions and functions from programs written in an entirely untyped style. Being scope sensitive, it is not limited to deriving the types only from a small portion of source code, but rather from complete programs or modules. Being able to cope with parametric types, too, it is core to the type systems of many functional programming languages. It was first applied in this manner in the ML programming language. The origin is the type inference algorithm for the simply typed lambda calculus that was devised by Haskell Curry and Robert Feys in 1958. In 1969, J. Roger Hindley extended this work and proved that their algorithm always inferred the most general type. In 1978, Robin Milner, independently of Hindley's work, provided an equivalent algorithm, Algorithm W. In 1982, Luis Damas finally proved that Milner's algorithm is complete and extended it to support systems with polymorphic references. === Monomorphism vs. polymorphism === In the simply typed lambda calculus, types T are either atomic type constants or function types of form T → T {\displaystyle T\rightarrow T} . Such types are monomorphic. Typical examples are the types used in arithmetic values: 3 : N u m b e r a d d 3 4 : N u m b e r a d d : N u m b e r → N u m b e r → N u m b e r {\displaystyle {\begin{array}{ll}3&:{\mathtt {Number}}\\{\mathtt {add}}\ 3\ 4&:{\mathtt {Number}}\\{\mathtt {add}}&:{\mathtt {Number}}\rightarrow {\mathtt {Number}}\rightarrow {\mathtt {Number}}\end{array}}} Contrary to this, the untyped lambda calculus is neutral to typing at all, and many of its functions can be meaningfully applied to all type of arguments. The trivial example is the identity function i d ≡ λ x . x {\displaystyle {\mathtt {id}}\equiv \lambda x.x} which simply returns whatever value it is applied to. Less trivial examples include parametric types like lists. While polymorphism in general means that operations accept values of more than one type, the polymorphism used here is parametric. One finds the notation of type schemes in the literature, too, emphasizing the parametric nature of the polymorphism. Additionally, constants may be typed with (quantified) type variables. For example, the following type schemes quantify universally over α {\displaystyle \alpha } , meaning that they are true for all possible α {\displaystyle \alpha } : c o n s : ∀ α . α → L i s t α → L i s t α n i l : ∀ α . L i s t α i d : ∀ α . α → α {\displaystyle {\begin{array}{ll}{\mathtt {cons}}&:\forall \alpha .\alpha \rightarrow {\mathtt {List}}\ \alpha \rightarrow {\mathtt {List}}\ \alpha \\{\mathtt {nil}}&:\forall \alpha .{\mathtt {List}}\ \alpha \\{\mathtt {id}}&:\forall \alpha .\alpha \rightarrow \alpha \end{array}}} Polymorphic types can become monomorphic by consistent substitution of their variables. Examples of monomorphic instances are: i d ′ : S t r i n g → S t r i n g n i l ′ : L i s t N u m b e r {\displaystyle {\begin{array}{ll}{\mathtt {id}}'&:{\mathtt {String}}\rightarrow {\mathtt {String}}\\{\mathtt {nil}}'&:{\mathtt {List}}\ {\mathtt {Number}}\end{array}}} More generally, types are polymorphic when they contain type variables, while types without them are monomorphic. Contrary to the type systems used for example in Pascal (1970) or C (1972), which only support monomorphic types, HM is designed with emphasis on parametric polymorphism. The successors of the languages mentioned, like C++ (1985), focused on different types of polymorphism, namely subtyping in connection with object-oriented programming and overloading. While subtyping is incompatible with HM, a variant of systematic overloading is available in the HM-based type system of Haskell. === Let-polymorphism === When extending the type inference for the simply-typed lambda calculus towards polymorphism, one has to decide whether assigning a polymorphic type not only as type of an expression, but also as the type of a λ-bound variable is admissible. This would allow the generic identity type to be assigned to the variable 'id' in: (λ id . ... (id 3) ... (id "text") ... ) (λ x . x) Allowing this gives rise to the polymorphic lambda calculus; however, type inference in this system is not decidable. Instead, HM distinguishes variables that are immediately bound to an expression from more general λ-bound variables, calling the former let-bound variables, and allows polymorphic types to be assigned only to these. This leads to let-polymorphism where the above example takes the form let id = λ x . x in ... (id 3) ... (id "text") ... which can be typed with a polymorphic type for 'id'. As indicated, the expression syntax is extended to make the let-bound variables explicit, and by restricting the type system to allow only let-bound variable to have polymorphic types, while the parameters in lambda-abstractions must get a monomorphic type, type inference becomes decidable. == Overview == The remainder of this article proceeds as follows: The HM type system is defined. This is done by describing a deduction system that makes precise what expressions have what type, if any. From there, it works towards an implementation of the type inference method. After introducing a syntax-driven variant of the above deductive system, it sketches an efficient implementation (algorithm J), appealing mostly to the reader's metalogical intuition. Because it remains open whether algorithm J indeed realises the initial deduction system, a less efficient implementation (algorithm W), is introduced and its use in a proof is hinted. Finally, further topics related to the algorithm are discussed. The same description of the deduction system is used throughout, even for the two algorithms, to make the various forms in which the HM method is presented directly comparable. == The Hindley–Milner type system == The type system can be formally described by syntax rules that fix a language for the expressions, types, etc. The presentation here of such a syntax is not too formal, in that it is written down not to study the surface grammar, but rather the depth grammar, and leaves some syntactical details open. This form of presentation is usual. Building on this, typing rules are used to define how expressions and types are related. As before, the form used is a bit liberal. === Syntax === The expressions to be typed are exactly those of the lambda calculus extended with a let-expression as shown in the adjacent table. Parentheses can be used to disambiguate an expression. The application is left-binding and binds stronger than abstraction or the let-in construct. Types are syntactically split into two groups, monotypes and polytypes. ==== Monotypes ==== Monotypes always designate a particular type. Monotypes τ {\displaystyle \tau } are syntactically represented as terms. Examples of monotypes include type constants like i n t {\displaystyle {\mathtt {int}}} or s t r i n g {\displaystyle {\mathtt {string}}} , and parametric types like M a p ( S e t s t r i n g ) i n t {\displaystyle {\mathtt {Map\ (Set\ string)\ int}}} . The latter types are examples of applications of type functions, for example, from the set { M a p 2 , S e t 1 , s t r i n g 0 , i n t 0 , → 2 } {\displaystyle \{{\mathtt {Map^{2},\ Set^{1},\ string^{0},\ int^{0}}},\ \rightarrow ^{2}\}} , where the superscript indicates the number of type parameters. The complete set of type functions C {\displaystyle C} is arbitrary in HM, except that it must contain at least → 2 {\displaystyle \rightarrow ^{2}} , the type of functions. It is often written in infix notation for convenience. For example, a function mapping integers to strings has type i n t → s t r i n g {\displaystyle {\mathtt {int}}\rightarrow {\mathtt {string}}} . Again, parentheses can be used to disambiguate a type expression. The application binds stronger than the infix arrow, which is right-binding. Type variables are admitted as monotypes. Monotypes are not to be confused with monomorphic types, which exc

    Read more →
  • Organizational metacognition

    Organizational metacognition

    Organizational metacognition is knowing what an organization knows, a concept related to metacognition, organizational learning, the learning organization and sensemaking. It is used to describe how organizations and teams develop an awareness of their own thinking, learning how to learn, where awareness of ignorance can motivate learning. The organizational deutero-learning concept identified by Argyris and Schon defines when organizations learn how to carry out single-loop and double-loop learning. It has also been described as learning how to learn through a process of collaborative inquiry and reflection (evaluative inquiry). "When an organization engages in deutero-learning its members learn about the previous context for learning. They reflect on and inquire into previous episodes of organizational learning, or failure to learn. They discover what they did that facilitated or inhibited learning, they invent new strategies for learning, they produce these strategies, and they evaluate and generalize what they have produced" Learning what facilitates and inhibits learning enables organizations to develop new strategies to develop their knowledge. For example, identification of a gap between perceived performance (such as satisfaction) and actual performance (outcomes) creates an awareness that makes the organization understand that learning needs to occur, driving appropriate changes to the environment and processes. == Learning prototypes == Wijnhoven (2001) grouped four learning prototypes that best meet learning needs, the match between these needs and learning norms dictating an organization's learning capabilities; deutero-learning is the acquisition of these capabilities. knowledge gap analysis classification of problems to select operationally required knowledge and skills coping with organizational tremors and jolts by anticipation, response and adjustments of behavioural repertoires decisional uncertainty measurement == Terminological ambiguities == Organizational metacognition and organizational deutero-learning have both been described as the concept or phenomenon where organizations learn how to learn. Argyris and Schon (1978) place deutero-learning into their cognitive theory of action framework, neglecting aspects of adaptive behaviour and context core to Bateson's (1972) original definitions. In order to resolve terminological ambiguities, Visser (2007) reviewed and reformulated the concept of deutero-learning as, "the behavioral adaptation to patterns of conditioning in relationships in organizational contexts, distinguishing it from meta-learning and planned learning" (pg. 659). == Significance == Organizational metacognition is considered a key norm to the prescriptive concept of the learning organization. Its significance has been recognized by industry, the military and in disaster response. == Examples in practice == Examples of poor metacognition (deutero-learning) have been described in knowledge network environments, "Knowledge networking is important to most competitive enterprises today. Enterprise knowledge is becoming ever more specialized in nature, so no single person or organization can know everything in detail. Hence addressing complex, multidisciplinary problems requires developing and accessing a network of knowledgeable people and organizations. The problem is, many otherwise knowledgeable people and organizations are not fully aware of their knowledge networks, and even more problematic, they are not aware that they are not aware. This focuses our attention toward organizational metacognition."

    Read more →
  • Seccomp

    Seccomp

    seccomp (short for secure computing) is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit(), sigreturn(), read() and write() to already-open file descriptors. Should it attempt any other system calls, the kernel will either just log the event or terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely. seccomp mode is enabled via the prctl(2) system call using the PR_SET_SECCOMP argument, or (since Linux kernel 3.17) via the seccomp(2) system call. seccomp mode used to be enabled by writing to a file, /proc/self/seccomp, but this method was removed in favor of prctl(). In some kernel versions, seccomp disables the RDTSC x86 instruction, which returns the number of elapsed processor cycles since power-on, used for high-precision timing. seccomp-bpf is an extension to seccomp that allows filtering of system calls using a configurable policy implemented using Berkeley Packet Filter rules. It is used by OpenSSH and vsftpd as well as the Google Chrome/Chromium web browsers on ChromeOS and Linux. (In this regard seccomp-bpf achieves similar functionality, but with more flexibility and higher performance, to the older systrace—which seems to be no longer supported for Linux.) Some consider seccomp comparable to OpenBSD pledge(2) and FreeBSD capsicum(4). == History == seccomp was first devised by Andrea Arcangeli in January 2005 for use in public grid computing and was originally intended as a means of safely running untrusted compute-bound programs. It was merged into the Linux kernel mainline in kernel version 2.6.12, which was released on March 8, 2005. == Software using seccomp or seccomp-bpf == Android uses a seccomp-bpf filter in the zygote since Android 8.0 Oreo. systemd's sandboxing options are based on seccomp. QEMU, the Quick Emulator, the core component to the modern virtualization together with KVM uses seccomp on the parameter --sandbox Docker – software that allows applications to run inside of isolated containers. Docker can associate a seccomp profile with the container using the --security-opt parameter. Arcangeli's CPUShare was the only known user of seccomp for a while. Writing in February 2009, Linus Torvalds expresses doubt whether seccomp is actually used by anyone. However, a Google engineer replied that Google is exploring using seccomp for sandboxing its Chrome web browser. Firejail is an open source Linux sandbox program that utilizes Linux namespaces, Seccomp, and other kernel-level security features to sandbox Linux and Wine applications. As of Chrome version 20, seccomp-bpf is used to sandbox Adobe Flash Player. As of Chrome version 23, seccomp-bpf is used to sandbox the renderers. Snap specify the shape of their application sandbox using "interfaces" which snapd translates to seccomp, AppArmor and other security constructs vsftpd uses seccomp-bpf sandboxing as of version 3.0.0. OpenSSH has supported seccomp-bpf since version 6.0. Mbox uses ptrace along with seccomp-bpf to create a secure sandbox with less overhead than ptrace alone. LXD, a Ubuntu "hypervisor" for containers Firefox and Firefox OS, which use seccomp-bpf Tor supports seccomp since 0.2.5.1-alpha Lepton, a JPEG compression tool developed by Dropbox uses seccomp Kafel is a configuration language, which converts readable policies into seccompb-bpf bytecode Subgraph OS uses seccomp-bpf Flatpak uses seccomp for process isolation Bubblewrap is a lightweight sandbox application developed from Flatpak minijail uses seccomp for process isolation SydBox uses seccomp-bpf to improve the runtime and security of the ptrace sandboxing used to sandbox package builds on Exherbo Linux distribution. File, a Unix program to determine filetypes, uses seccomp to restrict its runtime environment Zathura, a minimalistic document viewer, uses seccomp filter to implement different sandbox modes Tracker, a indexing and preview application for the GNOME desktop environment, uses seccomp to prevent automatic exploitation of parsing vulnerabilities in media files

    Read more →
  • Bibliographic database

    Bibliographic database

    A bibliographic database is a database of bibliographic records. This is an organised online collection of references to published written works like journal and newspaper articles, conference proceedings, reports, government and legal publications, patents and books. In contrast to library catalogue entries, a majority of the records in bibliographic databases describe articles and conference papers rather than complete monographs, and they generally contain very rich subject descriptions in the form of keywords, subject classification terms, or abstracts. A bibliographic database may cover a wide range of topics or one academic field like computer science. A significant number of bibliographic databases are marketed under a trade name by licensing agreement from vendors, or directly from their makers: the indexing and abstracting services. Many bibliographic databases have evolved into digital libraries, providing the full text of the organised contents:for instance CORE also organises and mirrors scholarly articles and OurResearch develops a search engine for open access content in Unpaywall. Others merge with non-bibliographic and scholarly databases to create more complete disciplinary search engine systems, such as Chemical Abstracts or Entrez. == History == Prior to the mid-20th century, individuals searching for published literature had to rely on printed bibliographic indexes, generated manually from index cards. During the early 1960s computers were used to digitize text for the first time; the purpose was to reduce the cost and time required to publish two American abstracting journals, the Index Medicus of the National Library of Medicine and the Scientific and Technical Aerospace Reports of the National Aeronautics and Space Administration (NASA). By the late 1960s, such bodies of digitized alphanumeric information, known as bibliographic and numeric databases, constituted a new type of information resource. Online interactive retrieval became commercially viable in the early 1970s over private telecommunications networks. The first services offered a few databases of indexes and abstracts of scholarly literature. These databases contained bibliographic descriptions of journal articles that were searchable by keywords in author and title, and sometimes by journal name or subject heading. The user interfaces were crude, the access was expensive, and searching was done by librarians on behalf of "end users".

    Read more →
  • Interviewer effect

    Interviewer effect

    The interviewer effect (also called interviewer variance or interviewer error) is the distortion of response to an interviewer-administered data collection effort which results from differential reactions to the social style and personality of interviewers or to their presentation of particular questions. The use of fixed-wording questions is one method of reducing interviewer bias. Anthropological research and case-studies are also affected by the problem, which is exacerbated by the self-fulfilling prophecy, when the researcher is also the interviewer it is also any effect on data gathered from interviewing people that is caused by the behavior or characteristics (real or perceived) of the interviewer. Interviewer effects can also be associated with the characteristics of the interviewer, such as race. Whether black respondents are interviewed by white interviewers or black interviewers has a strong impact on their responses to both attitude questions and behavioral ones. In the latter case, for example, if black respondents are interviewed by black interviewers in pre-election surveys, they are more likely to actually vote in the upcoming election than if they are interviewed by white interviewers. Furthermore, the race of the interviewer can also affect answers to factual questions that might take the form of a test of how informed the respondent is. Black respondents in a survey of political knowledge, for example, get fewer correct answers to factual questions about politics when interviewed by white interviewers than when interviewed by black interviewers. This is consistent with the research literature on stereotype threat, which finds diminished test performance of potentially stigmatised groups when the interviewer or test supervisor is from a perceived higher status group. Interviewer effects can be mitigated somewhat by randomly assigning subjects to different interviewers, or by using tools such as computer-assisted telephone interviewing (CATI).

    Read more →
  • Artificial intelligence in industry

    Artificial intelligence in industry

    Industrial artificial intelligence, or industrial AI, refers to the application of artificial intelligence to industrial business processes. Unlike general artificial intelligence which is a frontier research discipline to build computerized systems that perform tasks requiring human intelligence, industrial AI is more concerned with the application of such technologies to address industrial pain-points for customer value creation, productivity improvement, cost reduction, site optimization, predictive analysis and insight discovery. Artificial intelligence and machine learning have become key enablers to leverage data in production in recent years due to a number of different factors: More affordable sensors and the automated process of data acquisition; More powerful computation capability of computers to perform more complex tasks at a faster speed with lower cost; Faster connectivity infrastructure and more accessible cloud services for data management and computing power outsourcing. == Categories == Possible applications of industrial AI and machine learning in the production domain can be divided into seven application areas: Market and trend analysis Machinery and equipment Intralogistics Production process Supply chain Building Product Each application area can be further divided into specific application scenarios that describe concrete AI/ML scenarios in production. While some application areas have a direct connection to production processes, others cover production adjacent fields like logistics or the factory building. An example from the application scenario Process Design & Innovation are collaborative robots. Collaborative robotic arms are able to learn the motion and path demonstrated by human operators and perform the same task. Predictive and preventive maintenance through data-driven machine learning are application scenarios from the Machinery & Equipment application area. == Challenges == In contrast to entirely virtual systems, in which ML applications are already widespread today, real-world production processes are characterized by the interaction between the virtual and the physical world. Data is recorded using sensors and processed on computational entities and, if desired, actions and decisions are translated back into the physical world via actuators or by human operators. This poses major challenges for the application of ML in production engineering systems. These challenges are attributable to the encounter of process, data and model characteristics: The production domain's high reliability requirements, high risk and loss potential, the multitude of heterogeneous data sources and the non-transparency of ML model functionality impede a faster adoption of ML in real-world production processes. In particular, production data comprises a variety of different modalities, semantics and quality. Furthermore, production systems are dynamic, uncertain and complex, and engineering and manufacturing problems are data-rich but information-sparse. Besides that, due to the variety of use cases and data characteristics, problem-specific data sets are required, which are difficult to acquire, hindering both practitioners and academic researchers in this domain. === Process and industry characteristics === The domain of production engineering can be considered as a rather conservative industry when it comes to the adoption of advanced technology and their integration into existing processes. This is due to high demands on reliability of the production systems resulting from the potentially high economic harm of reduced process effectiveness due to e.g., additional unplanned downtime or insufficient product qualities. In addition, the specifics of machining equipment and products prevent area-wide adoptions across a variety of processes. Besides the technical reasons, the reluctant adoption of ML is fueled by a lack of IT and data science expertise across the domain. === Data characteristics === The data collected in production processes mainly stem from frequently sampling sensors to estimate the state of a product, a process, or the environment in the real world. Sensor readings are susceptible to noise and represent only an estimate of the reality under uncertainty. Production data typically comprises multiple distributed data sources resulting in various data modalities (e.g., images from visual quality control systems, time-series sensor readings, or cross-sectional job and product information). The inconsistencies in data acquisition lead to low signal-to-noise ratios, low data quality and great effort in data integration, cleaning and management. In addition, as a result from mechanical and chemical wear of production equipment, process data is subject to various forms of data drifts. === Machine learning model characteristics === ML models are considered as black-box systems given their complexity and intransparency of input-output relation. This reduces the comprehensibility of the system behavior and thus also the acceptance by plant operators. Due to the lack of transparency and the stochasticity of these models, no deterministic proof of functional correctness can be achieved, complicating the certification of production equipment. Given their inherent unrestricted prediction behavior, ML models are vulnerable against erroneous or manipulated data, further risking the reliability of the production system because of lacking robustness and safety. In addition to high development and deployment costs, the data drifts cause high maintenance costs, which is disadvantageous compared to purely deterministic programs. == Standard processes for data science in production == The development of ML applications – starting with the identification and selection of the use case and ending with the deployment and maintenance of the application – follows dedicated phases that can be organized in standard process models. The process models assist in structuring the development process and defining requirements that must be met in each phase to enter the next phase. The standard processes can be classified into generic and domain-specific ones. Generic standard processes (e.g., CRISP-DM, ASUM-DM, or knowledge discovery in databases (KDD)) describe a generally valid methodology and are thus independent of individual domains. Domain-specific processes on the other hand consider specific peculiarities and challenges of special application areas. The Machine Learning Pipeline in Production is a domain-specific data science methodology that is inspired by the CRISP-DM model and was specifically designed to be applied in fields of engineering and production technology. To address the core challenges of ML in engineering – process, data, and model characteristics – the methodology especially focuses on use-case assessment, achieving a common data and process understanding data integration, data preprocessing of real-world production data and the deployment and certification of real-world ML applications. == Industrial data sources == The foundation of most artificial intelligence and machine learning applications in industrial settings are comprehensive datasets from the respective fields. Those datasets act as the basis for training the employed models. In other domains, like computer vision, speech recognition or language models, extensive reference datasets (e.g. ImageNet, Librispeech, The People's Speech) and data scraped from the open internet are frequently used for this purpose. Such datasets rarely exist in the industrial context because of high confidentiality requirements and high specificity of the data. Industrial applications of artificial intelligence are therefore often faced with the problem of data availability. For these reasons, existing open datasets applicable to industrial applications, often originate from public institutions like governmental agencies or universities and data analysis competitions hosted by companies. In addition to this, data sharing platforms exist. However, most of these platforms have no industrial focus and offer limited filtering abilities regarding industrial data sources.

    Read more →
  • Multi-model database

    Multi-model database

    In the field of database design, a multi-model database is a database management system designed to support multiple data models against a single, integrated backend. In contrast, most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated. Document, graph, relational, and key–value models are examples of data models that may be supported by a multi-model database. == Background == The relational data model became popular after its publication by Edgar F. Codd in 1970. Due to increasing requirements for horizontal scalability and fault tolerance, NoSQL databases became prominent after 2009. NoSQL databases use a variety of data models, with document, graph, and key–value models being popular. A multi-model database is a database that can store, index and query data in more than one model. For some time, databases have primarily supported only one model, such as: relational database, document-oriented database, graph database or triplestore. A database that combines many of these is multi-model. This should not be confused with multimodal database systems such as Pixeltable or ApertureDB, which focus on unified management of different media types (images, video, audio, text) rather than different data models. For some time, it was all but forgotten (or considered irrelevant) that there were any other database models besides relational. The relational model and notion of third normal form were the default standard for all data storage. However, prior to the dominance of relational data modeling, from about 1980 to 2005, the hierarchical database model was commonly used. Since 2000 or 2010, many NoSQL models that are non-relational, including documents, triples, key–value stores and graphs are popular. Arguably, geospatial data, temporal data, and text data are also separate models, though indexed, queryable text data is generally termed a "search engine" rather than a database. The first time the word "multi-model" has been associated to the databases was on May 30, 2012 in Cologne, Germany, during the Luca Garulli's key note "NoSQL Adoption – What’s the Next Step?". Luca Garulli envisioned the evolution of the 1st generation NoSQL products into new products with more features able to be used by multiple use cases. The idea of multi-model databases can be traced back to Object–Relational Data Management Systems (ORDBMS) in the early 1990s and in a more broader scope even to federated and integrated DBMSs in the early 1980s. An ORDBMS system manages different types of data such as relational, object, text and spatial by plugging domain specific data types, functions and index implementations into the DBMS kernels. A multi-model database is most directly a response to the "polyglot persistence" approach of knitting together multiple database products, each handing a different model, to achieve a multi-model capability as described by Martin Fowler. This strategy has two major disadvantages: it leads to a significant increase in operational complexity, and there is no support for maintaining data consistency across the separate data stores, so multi-model databases have begun to fill in this gap. Multi-model databases are intended to offer the data modeling advantages of polyglot persistence, without its disadvantages. Operational complexity, in particular, is reduced through the use of a single data store. == Benchmarking multi-model databases == As more and more platforms are proposed to deal with multi-model data, there are a few works on benchmarking multi-model databases. For instance, Pluciennik, Oliveira, and UniBench reviewed existing multi-model databases and made an evaluation effort towards comparing multi-model databases and other SQL and NoSQL databases respectively. They pointed out that the advantages of multi-model databases over single-model databases are as follows : == Architecture == The main difference between the available multi-model databases is related to their architectures. Multi-model databases can support different models either within the engine or via different layers on top of the engine. Some products may provide an engine which supports documents and graphs while others provide layers on top of a key-key store. With a layered architecture, each data model is provided via its own component. == User-defined data models == In addition to offering multiple data models in a single data store, some databases allow developers to easily define custom data models. This capability is enabled by ACID transactions with high performance and scalability. In order for a custom data model to support concurrent updates, the database must be able to synchronize updates across multiple keys. ACID transactions, if they are sufficiently performant, allow such synchronization. JSON documents, graphs, and relational tables can all be implemented in a manner that inherits the horizontal scalability and fault-tolerance of the underlying data store. == Theoretical Foundation for Multi-Model Databases == The traditional theory of relations is not enough to accurately describe multi-model database systems. Recent research is focused on developing a new theoretical foundation for these systems. Category theory can provide a unified, rigorous language for modeling, integrating, and transforming different data models. By representing multi-model data as sets and their relationships as functions or relations within the Set category, we can create a formal framework to describe, manipulate, and understand various data models and how they interact.

    Read more →
  • Emotion recognition

    Emotion recognition

    Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables. == Human == Humans show a great deal of variability in their abilities to recognize emotion. A key point to keep in mind when learning about automated emotion recognition is that there are several sources of "ground truth", or truth about what the real emotion is. Suppose we are trying to recognize the emotions of Alex. One source is "what would most people say that Alex is feeling?" In this case, the 'truth' may not correspond to what Alex feels, but may correspond to what most people would say it looks like Alex feels. For example, Alex may actually feel sad, but he puts on a big smile and then most people say he looks happy. If an automated method achieves the same results as a group of observers it may be considered accurate, even if it does not actually measure what Alex truly feels. Another source of 'truth' is to ask Alex what he truly feels. This works if Alex has a good sense of his internal state, and wants to tell you what it is, and is capable of putting it accurately into words or a number. However, some people are alexithymic and do not have a good sense of their internal feelings, or they are not able to communicate them accurately with words and numbers. In general, getting to the truth of what emotion is actually present can take some work, can vary depending on the criteria that are selected, and will usually involve maintaining some level of uncertainty. == Automatic == Decades of scientific research have been conducted developing and evaluating methods for automated emotion recognition. There is now an extensive literature proposing and evaluating hundreds of different kinds of methods, leveraging techniques from multiple areas, such as signal processing, machine learning, computer vision, and speech processing. Different methodologies and techniques may be employed to interpret emotion such as Bayesian networks. , Gaussian Mixture models and Hidden Markov Models and deep neural networks. === Approaches === The accuracy of emotion recognition is usually improved when it combines the analysis of human expressions from multimodal forms such as texts, physiology, audio, or video. Different emotion types are detected through the integration of information from facial expressions, body movement and gestures, and speech. The technology is said to contribute in the emergence of the so-called emotional or emotive Internet. The existing approaches in emotion recognition to classify certain emotion types can be generally classified into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches. ==== Knowledge-based techniques ==== Knowledge-based techniques (sometimes referred to as lexicon-based techniques), utilize domain knowledge and the semantic and syntactic characteristics of text and potentially spoken language in order to detect certain emotion types. In this approach, it is common to use knowledge-based resources during the emotion classification process such as WordNet, SenticNet, ConceptNet, and EmotiNet, to name a few. One of the advantages of this approach is the accessibility and economy brought about by the large availability of such knowledge-based resources. A limitation of this technique on the other hand, is its inability to handle concept nuances and complex linguistic rules. Knowledge-based techniques can be mainly classified into two categories: dictionary-based and corpus-based approaches. Dictionary-based approaches find opinion or emotion seed words in a dictionary and search for their synonyms and antonyms to expand the initial list of opinions or emotions. Corpus-based approaches on the other hand, start with a seed list of opinion or emotion words, and expand the database by finding other words with context-specific characteristics in a large corpus. While corpus-based approaches take into account context, their performance still vary in different domains since a word in one domain can have a different orientation in another domain. ==== Statistical methods ==== Statistical methods commonly involve the use of different supervised machine learning algorithms in which a large set of annotated data is fed into the algorithms for the system to learn and predict the appropriate emotion types. Machine learning algorithms generally provide more reasonable classification accuracy compared to other approaches, but one of the challenges in achieving good results in the classification process, is the need to have a sufficiently large training set. Some of the most commonly used machine learning algorithms include Support Vector Machines (SVM), Naive Bayes, and Maximum Entropy. Deep learning, which is under the unsupervised family of machine learning, is also widely employed in emotion recognition. Well-known deep learning algorithms include different architectures of Artificial Neural Network (ANN) such as Convolutional Neural Network (CNN), Long Short-term Memory (LSTM), and Extreme Learning Machine (ELM). The popularity of deep learning approaches in the domain of emotion recognition may be mainly attributed to its success in related applications such as in computer vision, speech recognition, and Natural Language Processing (NLP). ==== Hybrid approaches ==== Hybrid approaches in emotion recognition are essentially a combination of knowledge-based techniques and statistical methods, which exploit complementary characteristics from both techniques. Some of the works that have applied an ensemble of knowledge-driven linguistic elements and statistical methods include sentic computing and iFeel, both of which have adopted the concept-level knowledge-based resource SenticNet. The role of such knowledge-based resources in the implementation of hybrid approaches is highly important in the emotion classification process. Since hybrid techniques gain from the benefits offered by both knowledge-based and statistical approaches, they tend to have better classification performance as opposed to employing knowledge-based or statistical methods independently. A downside of using hybrid techniques however, is the computational complexity during the classification process. === Datasets === Data is an integral part of the existing approaches in emotion recognition and in most cases it is a challenge to obtain annotated data that is necessary to train machine learning algorithms. For the task of classifying different emotion types from multimodal sources in the form of texts, audio, videos or physiological signals, the following datasets are available: HUMAINE: provides natural clips with emotion words and context labels in multiple modalities Belfast database: provides clips with a wide range of emotions from TV programs and interview recordings SEMAINE: provides audiovisual recordings between a person and a virtual agent and contains emotion annotations such as angry, happy, fear, disgust, sadness, contempt, and amusement IEMOCAP: provides recordings of dyadic sessions between actors and contains emotion annotations such as happiness, anger, sadness, frustration, and neutral state eNTERFACE: provides audiovisual recordings of subjects from seven nationalities and contains emotion annotations such as happiness, anger, sadness, surprise, disgust, and fear DEAP: provides electroencephalography (EEG), electrocardiography (ECG), and face video recordings, as well as emotion annotations in terms of valence, arousal, and dominance of people watching film clips DREAMER: provides electroencephalography (EEG) and electrocardiography (ECG) recordings, as well as emotion annotations in terms of valence, dominance of people watching film clips MELD: is a multiparty conversational dataset where each utterance is labeled with emotion and sentiment. MELD provides conversations in video format and hence suitable for multimodal emotion recognition and sentiment analysis. MELD is useful for multimodal sentiment analysis and emotion recognition, dialogue systems and emotion recognition in conversations. MuSe: provides audiovisual recordings of natural interactions between a person and an object. It has discrete and continuous emotion annotations in terms of valence, arousal and trustworthiness as well as speech topics useful for multimodal sentiment analysis and emotion recognition. UIT-VSMEC: is a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese

    Read more →
  • Kleene's algorithm

    Kleene's algorithm

    In theoretical computer science, in particular in formal language theory, Kleene's algorithm transforms a given nondeterministic finite automaton (NFA) into a regular expression. Together with other conversion algorithms, it establishes the equivalence of several description formats for regular languages. Alternative presentations of the same method include the "elimination method" attributed to Brzozowski and McCluskey, the algorithm of McNaughton and Yamada, and the use of Arden's lemma. == Algorithm description == According to Gross and Yellen (2004), the algorithm can be traced back to Kleene (1956). A presentation of the algorithm in the case of deterministic finite automata (DFAs) is given in Hopcroft and Ullman (1979). The presentation of the algorithm for NFAs below follows Gross and Yellen (2004). Given a nondeterministic finite automaton M = (Q, Σ, δ, q0, F), with Q = { q0,...,qn } its set of states, the algorithm computes the sets Rkij of all strings that take M from state qi to qj without going through any state numbered higher than k. Here, "going through a state" means entering and leaving it, so both i and j may be higher than k, but no intermediate state may. Each set Rkij is represented by a regular expression; the algorithm computes them step by step for k = -1, 0, ..., n. Since there is no state numbered higher than n, the regular expression Rn0j represents the set of all strings that take M from its start state q0 to qj. If F = { q1,...,qf } is the set of accept states, the regular expression Rn01 | ... | Rn0f represents the language accepted by M. The initial regular expressions, for k = -1, are computed as follows for i≠j: R−1ij = a1 | ... | am where qj ∈ δ(qi,a1), ..., qj ∈ δ(qi,am) and as follows for i=j: R−1ii = a1 | ... | am | ε where qi ∈ δ(qi,a1), ..., qi ∈ δ(qi,am) In other words, R−1ij mentions all letters that label a transition from i to j, and we also include ε in the case where i=j. After that, in each step the expressions Rkij are computed from the previous ones by Rkij = Rk-1ik (Rk-1kk) Rk-1kj | Rk-1ij Another way to understand the operation of the algorithm is as an "elimination method", where the states from 0 to n are successively removed: when state k is removed, the regular expression Rk-1ij, which describes the words that label a path from state i>k to state j>k, is rewritten into Rkij so as to take into account the possibility of going via the "eliminated" state k. By induction on k, it can be shown that the length of each expression Rkij is at most ⁠1/3⁠(4k+1(6s+7) - 4) symbols, where s denotes the number of characters in Σ. Therefore, the length of the regular expression representing the language accepted by M is at most ⁠1/3⁠(4n+1(6s+7)f - f - 3) symbols, where f denotes the number of final states. This exponential blowup is inevitable, because there exist families of DFAs for which any equivalent regular expression must be of exponential size. In practice, the size of the regular expression obtained by running the algorithm can be very different depending on the order in which the states are considered by the procedure, i.e., the order in which they are numbered from 0 to n. == Example == The automaton shown in the picture can be described as M = (Q, Σ, δ, q0, F) with the set of states Q = { q0, q1, q2 }, the input alphabet Σ = { a, b }, the transition function δ with δ(q0,a)=q0, δ(q0,b)=q1, δ(q1,a)=q2, δ(q1,b)=q1, δ(q2,a)=q1, and δ(q2,b)=q1, the start state q0, and set of accept states F = { q1 }. Kleene's algorithm computes the initial regular expressions as After that, the Rkij are computed from the Rk-1ij step by step for k = 0, 1, 2. Kleene algebra equalities are used to simplify the regular expressions as much as possible. Step 0 Step 1 Step 2 Since q0 is the start state and q1 is the only accept state, the regular expression R201 denotes the set of all strings accepted by the automaton.

    Read more →
  • Secure Electronic Delivery

    Secure Electronic Delivery

    Secure Electronic Delivery (SED) is a service created in 2003 and provided by the British Library Document Supply Service (BLDSS). Its purpose is to enable faster delivery of digital materials as encrypted, copyright-compliant PDF Documents, to a personal e-mail address. These documents are supplied from the British Library via its On Demand service. When the British Library supplies articles electronically, it sends them securely in order to ensure its usage is permitted (research purposes) and copyright law is observed. == Methods == As the publishing industry, authors and creators become highly protective of their assets and intellectual property, they impose strict rules on delivery methods to prevent copyright infringement. Nowadays, DRM-enabled secure delivery appears to be the most widely used solution to address issues faced by libraries in supplying ebooks and digital materials to their users. SED, one of these solutions, is using Adobe LiveCycle Digital Rights Management (LCDRM) as an encryption method to deliver documents. == Advantages == SED offers convenience, quality and speed as documents are delivered upon request at any location and on any device. Requested articles are scanned for high quality reproduction, opened anywhere on any machine, including mobile devices. == Restrictions == The following are restrictions hold in a SED service implementation: The digital material is accessible only for 14 days via a link sent to a personal message. Due to copyright reasons, the material can be opened only once, saved for 14 days and does not allow a copy-paste action. Upon display, the material must be printed from the same device and reprinted only once. The On Demand encryption technology works best on the default Safari browser although other browsers may accommodate it.

    Read more →
  • Toad (software)

    Toad (software)

    Toad is a database management toolset from Quest Software for managing relational and non-relational databases using SQL aimed at database developers, database administrators, and data analysts. The Toad toolset runs against Oracle, SQL Server, IBM DB2 (LUW & z/OS), SAP and MySQL. A Toad product for data preparation supports many data platforms. == History == A practicing Oracle DBA, Jim McDaniel, designed Toad for his own use in the mid-1990s. He called it Tool for Oracle Application Developers, shortened to "TOAD". McDaniel initially distributed the tool as shareware and later online as freeware. Quest Software acquired TOAD in October 1998. Quest Software itself was acquired by Dell in 2012 to form Dell Software. In June 2016, Dell announced the sale of their software division, including the Quest business, to Francisco Partners and Elliott Management Corporation. On October 31, 2016, the sale was finalized. On November 1, 2016, the sale of Dell Software to Francisco Partners and Elliott Management was completed, and the company re-launched as Quest Software. == Features == Connection Manager - Allow users to connect natively to the vendor’s database whether on-premise or DBaaS. Browser - Allow users to browse all the different database/schema objects and their properties effective management. Editor - A way to create and maintain scripts and database code with debugging and integration with source control. Unit Testing (Oracle) - Ensures code is functionally tested before it is released into production. Static code review (Oracle) - Ensures code meets required quality level using a rules-based system. SQL Optimization - Provides developers with a way to tune and optimize SQL statements and database code without relying on a DBA. Advanced optimization enables DBAs to tune SQL effectively in production. Scalability testing and database workload replay - Ensures that database code and SQL will scale properly before it gets released into production. == Books == Toad Pocket Reference for Oracle plsql 1st Edition by Jim McDaniel and Patrick McGrath, O'Reilly, 2002 (ISBN 0596003374, ISBN 978-0-596-00337-1) Toad Pocket Reference for Oracle 2nd Edition by Jeff Smith, Bert Scalzo, and Patrick McGrath, O'Reilly, 2005 (ISBN 0596009712, ISBN 978-0-596-00971-7) TOAD Handbook by Bert Scalzo and Dan Hotka, Sams, 2003 (ISBN 0672324865, ISBN 978-0-672-32486-4) TOAD Handbook 2nd Edition by Bert Scalzo and Dan Hotka, Addison-Wesley Professional, 2009 (ISBN 0321649109, ISBN 978-0-321-64910-2). TOAD Handbook 2nd Edition by Bert Scalzo and Dan Hotka, Addison-Wesley Professional, 2009 (ISBN 0321649109, ISBN 978-0-321-64910-2).

    Read more →
  • Irish logarithm

    Irish logarithm

    The Irish logarithm was a system of number manipulation invented by Percy Ludgate for machine multiplication. The system used a combination of mechanical cams as lookup tables and mechanical addition to sum pseudo-logarithmic indices to produce partial products, which were then added to produce results. The technique is similar to Zech logarithms (also known as Jacobi logarithms), but uses a system of indices original to Ludgate. == Concept == Ludgate's algorithm compresses the multiplication of two single decimal numbers into two table lookups (to convert the digits into indices), the addition of the two indices to create a new index which is input to a second lookup table that generates the output product. Because both lookup tables are one-dimensional, and the addition of linear movements is simple to implement mechanically, this allows a less complex mechanism than would be needed to implement a two-dimensional 10×10 multiplication lookup table. Ludgate stated that he deliberately chose the values in his tables to be as small as he could make them; given this, Ludgate's tables can be simply constructed from first principles, either via pen-and-paper methods, or a systematic search using only a few tens of lines of program code. They do not correspond to either Zech logarithms, Remak indexes or Korn indexes. == Pseudocode == The following is an implementation of Ludgate's Irish logarithm algorithm in the Python programming language: Table 1 is taken from Ludgate's original paper; given the first table, the contents of Table 2 can be trivially derived from Table 1 and the definition of the algorithm. Note since that the last third of the second table is entirely zeros, this could be exploited to further simplify a mechanical implementation of the algorithm.

    Read more →
  • Information

    Information

    Information is an abstract concept that refers to something which has the power to inform. At the most fundamental level, it pertains to the interpretation (perhaps formally) of that which may be sensed, or their abstractions. Any natural process that is not completely random and any observable pattern in any medium can be said to convey some amount of information. Whereas digital signals and other data use discrete signs to convey information, other phenomena and artifacts such as analogue signals, poems, pictures, music or other sounds, and currents convey information in a more continuous form. Information is not knowledge itself, but the meaning that may be derived from a representation through interpretation. The concept of information is relevant to and connected with various concepts, including constraint, communication, control, data, form, education, knowledge, meaning, understanding, mental stimuli, pattern, perception, proposition, representation, and entropy. Information is often processed iteratively: Data available at one step are processed into information to be interpreted and processed at the next step. For example, in written text each symbol or letter conveys information relevant to the word it is part of, each word conveys information relevant to the phrase it is part of, each phrase conveys information relevant to the sentence it is part of, and so on until at the final step information is interpreted and becomes knowledge in a given domain. In a digital signal, bits may be interpreted into the symbols, letters, numbers, or structures that convey the information available at the next level up. The key characteristic of information is that it is subject to interpretation and processing. The derivation of information from a signal or message may be thought of as the resolution of ambiguity or uncertainty that arises during the interpretation of patterns within the signal or message. Information may be structured as data. Redundant data can be compressed up to an optimal size, which is the theoretical limit of compression. The information available through a collection of data may be derived by analysis. For example, a restaurant collects data from every customer order. That information may be analyzed to produce knowledge that is put to use when the business subsequently wants to identify the most popular or least popular dish. Information can be transmitted in time, via data storage, and space, via communication and telecommunication. Information is expressed either as the content of a message or through direct or indirect observation. That which is perceived can be construed as a message in its own right, and in that sense, all information is always conveyed as the content of a message. Information can be encoded into various forms for transmission and interpretation (for example, information may be encoded into a sequence of signs, or transmitted via a signal). It can also be encrypted for safe storage and communication. The uncertainty of an event is measured by its probability of occurrence. Uncertainty is proportional to the negative logarithm of the probability of occurrence. Information theory takes advantage of this by concluding that more uncertain events require more information to resolve their uncertainty. The bit is the standard unit of information. It is 'that which reduces uncertainty by half'. Other units such as the nat may be used. For example, the information encoded in one "fair" coin flip is log2(2/1) = 1 bit, and in two fair coin flips is log2(4/1) = 2 bits. A 2011 Science article estimates that 97% of technologically stored information was already in digital bits in 2007 and that the year 2002 was the beginning of the digital age for information storage (with digital storage capacity bypassing analogue for the first time). == Etymology and history of the concept == The English word "information" comes from Middle French enformacion/informacion/information 'a criminal investigation' and its etymon, Latin informatiō(n) 'conception, teaching, creation'. In English, "information" is an uncountable mass noun. References on "formation or molding of the mind or character, training, instruction, teaching" date from the 14th century in both English (according to Oxford English Dictionary) and other European languages. In the transition from Middle Ages to Modernity the use of the concept of information reflected a fundamental turn in epistemological basis – from "giving a (substantial) form to matter" to "communicating something to someone". Peters (1988, pp. 12–13) concludes: Information was readily deployed in empiricist psychology (though it played a less important role than other words such as impression or idea) because it seemed to describe the mechanics of sensation: objects in the world inform the senses. But sensation is entirely different from "form" – the one is sensual, the other intellectual; the one is subjective, the other objective. My sensation of things is fleeting, elusive, and idiosyncratic. For Hume, especially, sensory experience is a swirl of impressions cut off from any sure link to the real world... In any case, the empiricist problematic was how the mind is informed by sensations of the world. At first informed meant shaped by; later it came to mean received reports from. As its site of action drifted from cosmos to consciousness, the term's sense shifted from unities (Aristotle's forms) to units (of sensation). Information came less and less to refer to internal ordering or formation, since empiricism allowed for no preexisting intellectual forms outside of sensation itself. Instead, information came to refer to the fragmentary, fluctuating, haphazard stuff of sense. Information, like the early modern worldview in general, shifted from a divinely ordered cosmos to a system governed by the motion of corpuscles. Under the tutelage of empiricism, information gradually moved from structure to stuff, from form to substance, from intellectual order to sensory impulses. In the modern era, the most important influence on the concept of information is derived from the Information theory developed by Claude Shannon and others. This theory, however, reflects a fundamental contradiction. Northrup (1993) wrote: Thus, actually two conflicting metaphors are being used: The well-known metaphor of information as a quantity, like water in the water-pipe, is at work, but so is a second metaphor, that of information as a choice, a choice made by :an information provider, and a forced choice made by an :information receiver. Actually, the second metaphor implies that the information sent isn't necessarily equal to the information received, because any choice implies a comparison with a list of possibilities, i.e., a list of possible meanings. Here, meaning is involved, thus spoiling the idea of information as a pure "Ding an sich." Thus, much of the confusion regarding the concept of information seems to be related to the basic confusion of metaphors in Shannon's theory: is information an autonomous quantity, or is information always per SE information to an observer? Actually, I don't think that Shannon himself chose one of the two definitions. Logically speaking, his theory implied information as a subjective phenomenon. But this had so wide-ranging epistemological impacts that Shannon didn't seem to fully realize this logical fact. Consequently, he continued to use metaphors about information as if it were an objective substance. This is the basic, inherent contradiction in Shannon's information theory." (Northrup, 1993, p. 5). In their seminal book The Study of Information: Interdisciplinary Messages, Almach and Mansfield (1983) collected key views on the interdisciplinary controversy in computer science, artificial intelligence, library and information science, linguistics, psychology, and physics, as well as in the social sciences. Almach (1983, p. 660) himself disagrees with the use of the concept of information in the context of signal transmission, the basic senses of information in his view all referring "to telling something or to the something that is being told. Information is addressed to human minds and is received by human minds." All other senses, including its use with regard to nonhuman organisms as well to society as a whole, are, according to Machlup, metaphoric and, as in the case of cybernetics, anthropomorphic. Hjørland (2007) describes the fundamental difference between objective and subjective views of information and argues that the subjective view has been supported by, among others, Bateson, Yovits, Span-Hansen, Brier, Buckland, Goguen, and Hjørland. Hjørland provided the following example: A stone on a field could contain different information for different people (or from one situation to another). It is not possible for information systems to map all the stone's possible information for every individual. Nor is any one mapping the one "true" mapping. But peop

    Read more →