AI For Economics Students

AI For Economics Students — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Universal portfolio algorithm

    Universal portfolio algorithm

    The universal portfolio algorithm is a portfolio selection algorithm from the field of machine learning and information theory. The algorithm learns adaptively from historical data and maximizes log-optimal growth rate in the long run, per the Kelly criterion. It was introduced by the late Stanford University information theorist Thomas M. Cover. The algorithm rebalances the portfolio at the beginning of each trading period. At the beginning of the first trading period it starts with a naive diversification. In the following trading periods the portfolio composition depends on the historical total return of all possible constant-rebalanced portfolios. The universal portfolio algorithm is the predecessor of the various online portfolio selection methodologies.

    Read more →
  • Evidence-based library and information practice

    Evidence-based library and information practice

    Evidence-based library and information practice (EBLIP) or evidence-based librarianship (EBL) is the use of evidence-based practices (EBP) in the field of library and information science (LIS). This means that all practical decisions made within LIS should 1) be based on research studies and 2) that these research studies are selected and interpreted according to some specific norms characteristic for EBP. Typically such norms disregard theoretical studies and qualitative studies and consider quantitative studies according to a narrow set of criteria of what counts as evidence. If such a narrow set of methodological criteria are not applied, it is better instead to speak of research based library and information practice. == Characteristics == Evidence-based practice in general has been characterised as a positivist approach; EBLIP is therefore also a positivist approach to LIS. As such, EBLIP is an approach in contrast to other approaches to LIS. The use of statistical approaches known as meta-analysis to conclude what evidence has been reported in the literature is one among other methods which is typical for the evidence-based approach. In 2002, Booth noted the three schools of EBILP had some commonalities, including the context of day-to-day decision-making, an emphasis on improving the quality of professional practice, a pragmatic focus on the 'best available evidence', incorporation of the user perspective, the acceptance of a broad range of quantitative and qualitative research designs, and access, either first-hand or second-hand, to the (process of) evidence-based practice and its products. He added one more, that EBILP is concerned with getting the best value for money. == The role of library and information science in EBP == Evidence-based practice in general is based on a very thorough search of the scientific literature and a very thorough selection and analysis of the retrieved literature. A close familiarity with database searching is needed, and library and information professionals have important roles to play in this respect. Therefore LIS professionals should be well suited to help professionals in other disciplines doing EBP. EBLIP is the application of this approach on LIS itself. It should be mentioned, however, that EBP started in medicine as evidence-based medicine (EBM) from which it spread to other fields. Only slowly and to a limited extent has EBP moved on to LIS. The EBLIP process can be applied to a variety of scenarios in LIS, including customer service, collection development, library management and information literacy instruction. In general, quantitative methods are used in LIS research. A 2010 study revealed five categories that capture the different ways library and information professionals experience evidence-based practice: Evidence-based practice is experienced as irrelevant; Evidence-based practice is experienced as learning from published research; Evidence-based practice is experienced as service improvement; Evidence-based practice is experienced as a way of being; Evidence-based practice is experienced as a weapon.

    Read more →
  • Knowledge organization system

    Knowledge organization system

    Knowledge organization system (KOS), concept system, or concept scheme is the generic term used in knowledge organization (KO) for the selection of concepts with an indication of selected semantic relations. Despite their differences in type, coverage, and application, all KOS aim to support the organization of knowledge and information to facilitate their management and retrieval. KOS vary in complexity from simple sorted lists to complex relational networks. They represent both structural and functional features, and serve to eliminate ambiguity, control synonyms, establish relationships, and present properties. From their origins in library and information science (LIS), KOS have been applied to other domains and disciplines within science and industry, although scholarly research and debate remain primarily within the KO field. Challenges of KOS include ambiguity of terminology, repercussions of biased systems, and potential obsolescence. KOS can be expressed in RDF and RDFS as per the Simple Knowledge Organization System (SKOS) recommendation by W3C, which aims to enable the sharing and linking of KOS via the Web. One of the largest collections of KOS is the BARTOC registry. == Types == While different schema of KOS have been proposed, most are generally arranged in terms of the complexity of their construction and maintenance. Some scholars argue that organizing KOS on a spectrum oversimplifies the shared characteristics among them, and may even result in a non-ideal structure being chosen. The following types are not exhaustive, and are often not mutually-exclusive in practice. === Term lists === Term lists are the least structured form of KOS. They include lists, glossaries, dictionaries, and synonym rings. Authority files and gazetteers may also be considered term lists, however other scholars categorize them and directories as "metadata-like models". Examples include the Union List of Artist Names name authority file and the GeoNames gazetteer. === Categorization and classification === KOS that emphasize specific (and often hierarchical) structures include subject headings, taxonomies, categorization schema, and classification schema & systems. Despite inconsistent use of the terms "categorization" and "classification" in some literature, categorization is generally loosely-assembled grouping schema and may include attributes that are not mutually exclusive (or having fuzzy boundaries), while classification is related to the arrangement of non-overlapping and mutually-exclusive classes. Classification schema may be universal (such as Dewey Decimal Classification and Information Coding Classification) or domain-specific (such as the National Library of Medicine Classification). === Relationship models === The types of KOS with greatest complexity and which utilize connections between concepts include thesauri, semantic networks, and ontologies. One of the most prominent examples of a semantic network is WordNet. === Others === Certain structures proposed to be considered types of KOS—but are not consistently included in schema—include folksonomies, topic maps, web directory structures, publication organization systems, and bibliometric maps. Some KOS organize other KOS themselves—for instance, PeriodO is a gazetteer of periodization categories. == Applications == Some early KOS were developed as a support system for abstracting and indexing services to be used by specially-trained searchers. With the growth of information digitization, usability became increasingly accessible, and more complex structures were developed. Prominent examples of KOS outside of LIS include organism taxonomy in biology, the periodic table of elements in chemistry, SIC and NAICS classification systems for industry & business, and AGROVOC agricultural controlled vocabulary. == Challenges == The study and design of KOS is an ongoing topic of discussion among KO scholars. === Terminology === [There is] a serious lack of vocabulary control in the literature on controlled vocabulary. Inconsistency of terminology within the study of KOS is a common issue. For instance, "ontology" is used for both a specific type of KOS as well as a generic term for any KOS. The terms "taxonomy", "classification", and "categorization" are also sometimes used interchangeably. === Bias === As knowledge can be historically and culturally biased, scholars have also discussed how KOS themselves can perpetuate harmful practices or stereotypes. For example, a number of concerns and criticisms about the classification of mental disorders in the Diagnostic and Statistical Manual of Mental Disorders have been raised, contributing to ongoing revisions. Ethical and intentional design approaches have been proposed for multi-perspective KOS in efforts to mitigate bias and other harmful practices. === Obsolescence === The possible obsolescence of the thesaurus and other simpler KOS has been the topic of debate, especially in the face of increasingly complex ontologies, the growing usage of "Google-like retrieval systems", and the move of KO theory and research away from LIS and toward computer science. Supporters of thesauri argue its continued usefulness for metadata enrichment, vocabulary mapping, and web services, as well as its usage in specific domains such as corporate intranets and digital image libraries.

    Read more →
  • Metadata management

    Metadata management

    Metadata management involves managing metadata about other data, whereby this "other data" is generally referred to as content data. The term is used most often in relation to digital media, but older forms of metadata are catalogs, dictionaries, and taxonomies. For example, the Dewey Decimal Classification is a metadata management system developed in 1876 for libraries. == Metadata schema == Metadata management goes by the end-to-end process and governance framework for creating, controlling, enhancing, attributing, defining and managing a metadata schema, model or other structured aggregation system, either independently or within a repository and the associated supporting processes (often to enable the management of content). For web-based systems, URLs, images, video etc. may be referenced from a triples table of object, attribute and value. == Scope == With specific knowledge domains, the boundaries of the metadata for each must be managed, since a general ontology is not useful to experts in one field whose language is knowledge-domain specific. == Metadata Manager == In the process of developing a knowledge management solution, creating a metadata schema, and a system in which metadata is managed, a dedicated resource may be appointed to maintain adherence to metadata standards as defined by data owners as well as general best practice. This person is responsible for curation of the business and technical layers of the metadata schema, and commonly involved with strategy and implementation. A metadata manager is not required to master all aspects, or be involved with everything concerning the solution, but an understanding of as much of the process as possible to ensure a relevant schema is developed. == Metadata management over time == Managing the metadata in a knowledge management solution is an important step in a metadata strategy. It is part of the strategy to make sure that the metadata are complete, current and correct at any given time. Managing a metadata project is also about making sure that users of the system are aware of the possibilities allowed by a well-designed metadata system and how to maximize the benefits of metadata. Regularly monitoring the metadata to ensure that the schema remains relevant is advised. === Wikipedia metadata === Wikipedia is a project that actively manages metadata for its articles and files. For example, volunteer editors carefully curate new biographical articles based on the notability (claim to fame), name, birth, and/or death dates. Similarly, volunteer editors carefully curate new architectural articles based on name, municipality, or geo coordinates. When new articles with a valid alternate spelling are added to Wikipedia that match up to existing articles based on metadata, these are then manually checked and if needed, tagged for merging. When new articles are added that are considered out of scope or otherwise unfit for Wikipedia, these are nominated for deletion. To help keep track of metadata on Wikipedia, the new Wikimedia project Wikidata was established in 2012. Click on the pictures to view more metadata about these images:

    Read more →
  • Algorithmic probability

    Algorithmic probability

    In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the method together with Bayes' rule to obtain probabilities of prediction for an algorithm's future outputs. In the mathematical formalism used, the observations have the form of finite binary strings viewed as outputs of Turing machines, and the universal prior is a probability distribution over the set of finite binary strings calculated from a probability distribution over programs (that is, inputs to a universal Turing machine). The prior is universal in the Turing-computability sense, i.e. no string has zero probability. It is not computable, but it can be approximated. Formally, the probability P {\displaystyle P} is not a probability and it is not computable. It is only "lower semi-computable" and a "semi-measure". By "semi-measure", it means that 0 ≤ ∑ x P ( x ) < 1 {\displaystyle 0\leq \sum _{x}P(x)<1} . That is, the "probability" does not actually sum up to one, unlike actual probabilities. This is because some inputs to the Turing machine causes it to never halt, which means the probability mass allocated to those inputs is lost. By "lower semi-computable", it means there is a Turing machine that, given an input string x {\displaystyle x} , can print out a sequence y 1 < y 2 < ⋯ {\displaystyle y_{1} Read more →

  • BRS/Search

    BRS/Search

    BRS/Search is a full-text database and information retrieval system. BRS/Search uses a fully inverted indexing system to store, locate, and retrieve unstructured data. It was the search engine that in 1977 powered Bibliographic Retrieval Services (BRS) commercial operations with 20 databases (including the first national commercial availability of MEDLINE); it has changed ownership several times during its development and is currently sold as Livelink ECM Discovery Server by Open Text Corporation. == Early development == Development on what was to become BRS began as Biomedical Communications Network (BCN) at the State University of New York at Albany (SUNY). BCN, which went online in 1968, provided on-line access to nine databases, including MEDLINE and BIOSIS Previews, to large universities and medical schools primarily in the Northeast of the USA. State funding for the project was withdrawn in 1975, and Bibliographic Retrieval Services (BRS) was formed as a non-profit concern the following year. It was incorporated in May 1976 as a for-profit corporation with Ron Quake as president, Jan Egeland as vice president in charge of marketing and training, and Lloyd Palmer as vice president of systems. == BRS commercial operations == In December 1976, the First BRS User Meeting was held in Syracuse, New York, and by January 1977 BRS started commercial operations with 20 databases (including the first national commercial availability of MEDLINE) and 9 million records, using modified IBM STAIRS (STorage And Information Retrieval System) software, Telenet for telecommunications, and timesharing mainframe computers of Carrier Corporation. In October 1980 BRS was sold by Egeland and Quake to Indian Head, Inc., a subsidiary of the Dutch company Thyssen-Bornemisza Group. == 1989–1993 == In 1989 Robert Maxwell acquired BRS and the BRS/Search software; he announced the planned incorporation of the ORBIT Search Service and BRS Information Technologies and renamed the whole group Maxwell Online, Inc. At that time BRS Information Technologies was serving the medical and academic library marketplace with over 150 databases. Maxwell later bought the publishing company Macmillan and put Maxwell Online under Macmillan. In the same year BRS/LINK (hypertext connection of databases; first application delivering full text) was announced. The initial BRS/LINK application "relates the citation in a bibliographic database to its full-text article in a second database," and "eliminates the need to re-execute a search strategy in the second database in order to find the corresponding full-text article." Initially BRS/LINK supported linking only selected bibliographic databases: MEDLINE, Health Planning and Administration, and MEDLINE References on AIDS to the full-text Comprehensive Core Medical Library. At the time of Robert Maxwell’s death in 1991, Macmillan brought in Andrew Gregory to represent the company during the 2 years that Maxwell’s affairs were being settled and to prepare Maxwell Online to be able to sell the components. Maxwell Online shortly thereafter underwent yet another name change, this time to InfoPro Technologies. == Dataware Technologies ownership of BRS/SEARCH == Early in 1994, InfoPro Technologies, a subsidiary of MHC Inc. (holding company for Macmillan Inc.), the former Maxwell Online service, sold off all its subsidiaries. ORBIT Search Services went to the French-owned Questel, the dial-up BRS Search Services to CD Plus Technologies (later to become OVID), and BRS Software Products (including BRS/SEARCH) to Dataware Technologies. Almost up to the end of InfoPro Technologies, BRS Software had been the fastest growing segment of the company. At the 14th BRS North American Users Group Conference in 1999, Dave Schubmehl of Dataware Technologies presented a paper in which he stated "The purpose of this presentation is to update BRS users on upcoming releases of BRS/Search, NetAnswer, and other Dataware products. BRS/Search 7.0 will include features specifically requested by customers, as well as other enhancements. Earlier this year, Dataware acquired Sovereign Hill Software, makers of InQuery. In light of that acquisition, and Dataware's other development projects, we'll look at Dataware's plans for all products, including BRS/Search and NetAnswer." == Open Text acquisition of BRS/Search == In 2001 BRS/Search was acquired by Open Text and became LiveLink ECM Discovery Server. It is now referred to as Open Text Discovery Server. Open Text still supports both BRS/Search and NetAnswer. The core BRS/Search technology in the Open Text portfolio was augmented with other capabilities through various acquisitions. For example, Dataware's acquisition of Sovereign-Hill brought InQuery, “a probabilistic information retrieval system using an inference network”, which was developed by the University of Massachusetts Amherst Center for Intelligent Information Retrieval] out of the UMass CIIR and into the marketplace. A product re-branding table shows the range of products, their old names and their new names. InQuery is a concept search engine that uses noun phrases, parts of speech and other co-occurrence relationships in overlapping passages of text rather than single term inverted indexes of single words in documents. Open Text's portfolio has grown to include Hummingbird Content Management, and has always included BASIS. == 2003 == BRS/Search North America User's Group (BRSNAUG) website with a June 8, 2003 date listed the following features for BRS/Search. The BRSNAUG also disincorporated in 2003. Cross-references to BRS/Search on the World Wide Web point to Open Text Livelink. Engine features include: Rapid query response time. Numerical data handling and elementary statistical processing (sum, avg, min, max) Search results weighting and relevancy ranking Left- and right-truncation and expansion of search terms Superior data compression – loaded databases typically use only about 1.5 times the input stream size in disk space Large capacity databases – up to 100 million documents, each with up to 65,000 paragraphs Fine control of indexing and searching – right down to the word, sentence, and paragraph level Fine control over data security. Document access can be controlled at the database, document, and paragraph level International language support for all 7/8 bit characters sets and customizable language tables Flexible and customizable stop word lists ANSI-compatible thesauri Hypertext links within and between documents and databases (R6.x) Support for natural language parsing of queries Automatic document summarization tools Client/Server development Programming interfaces for World-Wide Web (HTTP, HTML) access to databases

    Read more →
  • Knuth–Plass line-breaking algorithm

    Knuth–Plass line-breaking algorithm

    The Knuth–Plass algorithm is a line-breaking algorithm designed for use in Donald Knuth's typesetting program TeX. It integrates the problems of text justification and hyphenation into a single algorithm by using a discrete dynamic programming method to minimize a loss function that attempts to quantify the aesthetic qualities desired in the finished output. The algorithm works by dividing the text into a stream of three kinds of objects: boxes, which are non-resizable chunks of content, glue, which are flexible, resizeable elements, and penalties, which represent places where breaking is undesirable (or, if negative, desirable). The loss function, known as "badness", is defined in terms of the deformation of the glue elements, and any extra penalties incurred through line breaking. Making hyphenation decisions follows naturally from the algorithm, but the choice of possible hyphenation points within words, and optionally their preference weighting, must be performed first, and that information inserted into the text stream in advance. Knuth and Plass' original algorithm does not include page breaking, but may be modified to interface with a pagination algorithm, such as the algorithm designed by Plass in his PhD thesis. Typically, the cost function for this technique should be modified so that it does not count the space left on the final line of a paragraph; this modification allows a paragraph to end in the middle of a line without penalty. The same technique can also be extended to take into account other factors such as the number of lines or costs for hyphenating long words. == Computational complexity == A naive brute-force exhaustive search for the minimum badness by trying every possible combination of breakpoints would take an impractical O ( 2 n ) {\displaystyle O(2^{n})} time. The classic Knuth-Plass dynamic programming approach to solving the minimization problem is a worst-case O ( n 2 ) {\displaystyle O(n^{2})} algorithm but usually runs much faster, in close to linear time. Solving for the Knuth-Plass optimum can be shown to be a special case of the convex least-weight subsequence problem, which can be solved in O ( n ) {\displaystyle O(n)} time. Methods to do this include the SMAWK algorithm. == Simple example of minimum raggedness metric == For the input text AAA BB CC DDDDD with line width 6, a greedy algorithm that puts as many words on a line as possible while preserving order before moving to the next line, would produce: ------ Line width: 6 AAA BB Remaining space: 0 CC Remaining space: 4 DDDDD Remaining space: 1 The sum of squared space left over by this method is 0 2 + 4 2 + 1 2 = 17 {\displaystyle 0^{2}+4^{2}+1^{2}=17} . However, the optimal solution achieves the smaller sum 3 2 + 1 2 + 1 2 = 11 {\displaystyle 3^{2}+1^{2}+1^{2}=11} : ------ Line width: 6 AAA Remaining space: 3 BB CC Remaining space: 1 DDDDD Remaining space: 1 The difference here is that the first line is broken before BB instead of after it, yielding a better right margin and a lower cost 11.

    Read more →
  • Reservoir sampling

    Reservoir sampling

    Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory. The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far. == Motivation == Suppose we see a sequence of items, one at a time. We want to keep 10 items in memory, and we want them to be selected at random from the sequence. If we know the total number of items n and can access the items arbitrarily, then the solution is easy: select 10 distinct indices i between 1 and n with equal probability, and keep the i-th elements. The problem is that we do not always know the exact n in advance. == Simple: Algorithm R == A simple and popular but slow algorithm, Algorithm R, was created by Jeffrey Vitter. Initialize an array R {\displaystyle R} indexed from 1 {\displaystyle 1} to k {\displaystyle k} , containing the first k items of the input x 1 , . . . , x k {\displaystyle x_{1},...,x_{k}} . This is the reservoir. For each new input x i {\displaystyle x_{i}} , generate a random number j uniformly in { 1 , . . . , i } {\displaystyle \{1,...,i\}} . If j ∈ { 1 , . . . , k } {\displaystyle j\in \{1,...,k\}} , then set R [ j ] := x i . {\displaystyle R[j]:=x_{i}.} Otherwise, discard x i {\displaystyle x_{i}} . Return R {\displaystyle R} after all inputs are processed. This algorithm works by induction on i ≥ k {\displaystyle i\geq k} . While conceptually simple and easy to understand, this algorithm needs to generate a random number for each item of the input, including the items that are discarded. The algorithm's asymptotic running time is thus O ( n ) {\displaystyle O(n)} . Generating this amount of randomness and the linear run time causes the algorithm to be unnecessarily slow if the input population is large. This is Algorithm R, implemented as follows: == Optimal: Algorithm L == If we generate n {\displaystyle n} random numbers u 1 , . . . , u n ∼ U [ 0 , 1 ] {\displaystyle u_{1},...,u_{n}\sim U[0,1]} independently, then the indices of the smallest k {\displaystyle k} of them is a uniform sample of the k {\displaystyle k} -subsets of { 1 , . . . , n } {\displaystyle \{1,...,n\}} . The process can be done without knowing n {\displaystyle n} : Keep the smallest k {\displaystyle k} of u 1 , . . . , u i {\displaystyle u_{1},...,u_{i}} that has been seen so far, as well as w i {\displaystyle w_{i}} , the index of the largest among them. For each new u i + 1 {\displaystyle u_{i+1}} , compare it with u w i {\displaystyle u_{w_{i}}} . If u i + 1 < u w i {\displaystyle u_{i+1} Read more →

  • Model-based clustering

    Model-based clustering

    In statistics, cluster analysis is the algorithmic grouping of objects into homogeneous groups based on numerical measurements. Model-based clustering based on a statistical model for the data, usually a mixture model. This has several advantages, including a principled statistical basis for clustering, and ways to choose the number of clusters, to choose the best clustering model, to assess the uncertainty of the clustering, and to identify outliers that do not belong to any group. == Model-based clustering == Suppose that for each of n {\displaystyle n} observations we have data on d {\displaystyle d} variables, denoted by y i = ( y i , 1 , … , y i , d ) {\displaystyle y_{i}=(y_{i,1},\ldots ,y_{i,d})} for observation i {\displaystyle i} . Then model-based clustering expresses the probability density function of y i {\displaystyle y_{i}} as a finite mixture, or weighted average of G {\displaystyle G} component probability density functions: p ( y i ) = ∑ g = 1 G τ g f g ( y i ∣ θ g ) , {\displaystyle p(y_{i})=\sum _{g=1}^{G}\tau _{g}f_{g}(y_{i}\mid \theta _{g}),} where f g {\displaystyle f_{g}} is a probability density function with parameter θ g {\displaystyle \theta _{g}} , τ g {\displaystyle \tau _{g}} is the corresponding mixture probability where ∑ g = 1 G τ g = 1 {\displaystyle \sum _{g=1}^{G}\tau _{g}=1} . Then in its simplest form, model-based clustering views each component of the mixture model as a cluster, estimates the model parameters, and assigns each observation to cluster corresponding to its most likely mixture component. === Gaussian mixture model === The most common model for continuous data is that f g {\displaystyle f_{g}} is a multivariate normal distribution with mean vector μ g {\displaystyle \mu _{g}} and covariance matrix Σ g {\displaystyle \Sigma _{g}} , so that θ g = ( μ g , Σ g ) {\displaystyle \theta _{g}=(\mu _{g},\Sigma _{g})} . This defines a Gaussian mixture model. The parameters of the model, τ g {\displaystyle \tau _{g}} and θ g {\displaystyle \theta _{g}} for g = 1 , … , G {\displaystyle g=1,\ldots ,G} , are typically estimated by maximum likelihood estimation using the expectation-maximization algorithm (EM); see also EM algorithm and GMM model. Bayesian inference is also often used for inference about finite mixture models. The Bayesian approach also allows for the case where the number of components, G {\displaystyle G} , is infinite, using a Dirichlet process prior, yielding a Dirichlet process mixture model for clustering. === Choosing the number of clusters === An advantage of model-based clustering is that it provides statistically principled ways to choose the number of clusters. Each different choice of the number of groups G {\displaystyle G} corresponds to a different mixture model. Then standard statistical model selection criteria such as the Bayesian information criterion (BIC) can be used to choose G {\displaystyle G} . The integrated completed likelihood (ICL) is a different criterion designed to choose the number of clusters rather than the number of mixture components in the model; these will often be different if highly non-Gaussian clusters are present. === Parsimonious Gaussian mixture model === For data with high dimension, d {\displaystyle d} , using a full covariance matrix for each mixture component requires estimation of many parameters, which can result in a loss of precision, generalizabity and interpretability. Thus it is common to use more parsimonious component covariance matrices exploiting their geometric interpretation. Gaussian clusters are ellipsoidal, with their volume, shape and orientation determined by the covariance matrix. Consider the eigendecomposition of a matrix Σ g = λ g D g A g D g T , {\displaystyle \Sigma _{g}=\lambda _{g}D_{g}A_{g}D_{g}^{T},} where D g {\displaystyle D_{g}} is the matrix of eigenvectors of Σ g {\displaystyle \Sigma _{g}} , A g = diag { A 1 , g , … , A d , g } {\displaystyle A_{g}={\mbox{diag}}\{A_{1,g},\ldots ,A_{d,g}\}} is a diagonal matrix whose elements are proportional to the eigenvalues of Σ g {\displaystyle \Sigma _{g}} in descending order, and λ g {\displaystyle \lambda _{g}} is the associated constant of proportionality. Then λ g {\displaystyle \lambda _{g}} controls the volume of the ellipsoid, A g {\displaystyle A_{g}} its shape, and D g {\displaystyle D_{g}} its orientation. Each of the volume, shape and orientation of the clusters can be constrained to be equal (E) or allowed to vary (V); the orientation can also be spherical, with identical eigenvalues (I). This yields 14 possible clustering models, shown in this table: It can be seen that many of these models are more parsimonious, with far fewer parameters than the unconstrained model that has 90 parameters when G = 4 {\displaystyle G=4} and d = 9 {\displaystyle d=9} . Several of these models correspond to well-known heuristic clustering methods. For example, k-means clustering is equivalent to estimation of the EII clustering model using the classification EM algorithm. The Bayesian information criterion (BIC) can be used to choose the best clustering model as well as the number of clusters. It can also be used as the basis for a method to choose the variables in the clustering model, eliminating variables that are not useful for clustering. Different Gaussian model-based clustering methods have been developed with an eye to handling high-dimensional data. These include the pgmm method, which is based on the mixture of factor analyzers model, and the HDclassif method, based on the idea of subspace clustering. The mixture-of-experts framework extends model-based clustering to include covariates. == Example == We illustrate the method with a dateset consisting of three measurements (glucose, insulin, sspg) on 145 subjects for the purpose of diagnosing diabetes and the type of diabetes present. The subjects were clinically classified into three groups: normal, chemical diabetes and overt diabetes, but we use this information only for evaluating clustering methods, not for classifying subjects. The BIC plot shows the BIC values for each combination of the number of clusters, G {\displaystyle G} , and the clustering model from the Table. Each curve corresponds to a different clustering model. The BIC favors 3 groups, which corresponds to the clinical assessment. It also favors the unconstrained covariance model, VVV. This fits the data well, because the normal patients have low values of both sspg and insulin, while the distributions of the chemical and overt diabetes groups are elongated, but in different directions. Thus the volumes, shapes and orientations of the three groups are clearly different, and so the unconstrained model is appropriate, as selected by the model-based clustering method. The classification plot shows the classification of the subjects by model-based clustering. The classification was quite accurate, with a 12% error rate as defined by the clinical classification. Other well-known clustering methods performed worse with higher error rates, such as single-linkage clustering with 46%, average link clustering with 30%, complete-linkage clustering also with 30%, and k-means clustering with 28%. == Outliers in clustering == An outlier in clustering is a data point that does not belong to any of the clusters. One way of modeling outliers in model-based clustering is to include an additional mixture component that is very dispersed, with for example a uniform distribution. Another approach is to replace the multivariate normal densities by t {\displaystyle t} -distributions, with the idea that the long tails of the t {\displaystyle t} -distribution would ensure robustness to outliers. However, this is not breakdown-robust. A third approach is the "tclust" or data trimming approach which excludes observations identified as outliers when estimating the model parameters. == Non-Gaussian clusters and merging == Sometimes one or more clusters deviate strongly from the Gaussian assumption. If a Gaussian mixture is fitted to such data, a strongly non-Gaussian cluster will often be represented by several mixture components rather than a single one. In that case, cluster merging can be used to find a better clustering. A different approach is to use mixtures of complex component densities to represent non-Gaussian clusters. == Non-continuous data == === Categorical data === Clustering multivariate categorical data is most often done using the latent class model. This assumes that the data arise from a finite mixture model, where within each cluster the variables are independent. === Mixed data === These arise when variables are of different types, such as continuous, categorical or ordinal data. A latent class model for mixed data assumes local independence between the variable. The location model relaxes the local independence assumption. The clustMD approach assumes that the observed variables are manifestations of underlying continuous Gaussian latent

    Read more →
  • OpenSMILE

    OpenSMILE

    openSMILE is source-available software for automatic extraction of features from audio signals and for classification of speech and music signals. "SMILE" stands for "Speech & Music Interpretation by Large-space Extraction". The software is mainly applied in the area of automatic emotion recognition and is widely used in the affective computing research community. The openSMILE project exists since 2008 and is maintained by the German company audEERING GmbH since 2013. openSMILE is provided free of charge for research purposes and personal use under a source-available license. For commercial use of the tool, the company audEERING offers custom license options. == Application Areas == openSMILE is used for academic research as well as for commercial applications in order to automatically analyze speech and music signals in real-time. In contrast to automatic speech recognition which extracts the spoken content out of a speech signal, openSMILE is capable of recognizing the characteristics of a given speech or music segment. Examples for such characteristics encoded in human speech are a speaker's emotion, age, gender, and personality, as well as speaker states like depression, intoxication, or vocal pathological disorders. The software further includes music classification technology for automatic music mood detection and recognition of chorus segments, key, chords, tempo, meter, dance-style, and genre. The openSMILE toolkit serves as benchmark in manifold research competitions such as Interspeech ComParE, AVEC, MediaEval, and EmotiW. == History == The openSMILE project was started in 2008 by Florian Eyben, Martin Wöllmer, and Björn Schuller at the Technical University of Munich within the European Union research project SEMAINE. The goal of the SEMAINE project was to develop a virtual agent with emotional and social intelligence. In this system, openSMILE was applied for real-time analysis of speech and emotion. The final SEMAINE software release is based on openSMILE version 1.0.1. In 2009, the emotion recognition toolkit (openEAR) was published based on openSMILE. "EAR" stands for "Emotion and Affect Recognition". In 2010, openSMILE version 1.0.1 was published and was introduced and awarded at the ACM Multimedia Open-Source Software Challenge. Between 2011 and 2013, the technology of openSMILE was extended and improved by Florian Eyben and Felix Weninger in the context of their doctoral thesis at the Technical University of Munich. The software was also applied for the project ASC-Inclusion, which was funded by the European Union. For this project, the software was extended by Erik Marchi in order to teach emotional expression to autistic children, based on automatic emotion recognition and visualization. In 2013, the company audEERING acquired the rights to the code-base from the Technical University of Munich and version 2.0 was published under a source-available research license. Until 2016, openSMILE was downloaded more than 50,000 times worldwide and has established itself as a standard toolkit for emotion recognition. == Awards == openSMILE was awarded in 2010 in the context of the ACM Multimedia Open Source Competition. The software tool is applied in numerous scientific publications on automatic emotion recognition. openSMILE and its extension openEAR have been cited in more than 1000 scientific publications until today.

    Read more →
  • Upper ontology

    Upper ontology

    In information science, an upper ontology (also known as a top-level ontology, upper model, or foundation ontology) is an ontology (in the sense used in information science) that consists of very general terms (such as "object", "property", "relation") that are common across all domains. An important function of an upper ontology is to support broad semantic interoperability among a large number of domain-specific ontologies by providing a common starting point for the formulation of definitions. Terms in the domain ontology are ranked under the terms in the upper ontology, e.g., the upper ontology classes are superclasses or supersets of all the classes in the domain ontologies. A number of upper ontologies have been proposed, each with its own proponents. Library classification systems predate upper ontology systems. Though library classifications organize and categorize knowledge using general concepts that are the same across all knowledge domains, neither system is a replacement for the other. == Development == Any standard foundational ontology is likely to be contested among different groups, each with its own idea of "what exists". One factor exacerbating the failure to arrive at a common approach has been the lack of open-source applications that would permit the testing of different ontologies in the same computational environment. The differences have thus been debated largely on theoretical grounds, or are merely the result of personal preferences. Foundational ontologies can however be compared on the basis of adoption for the purposes of supporting interoperability across domain ontologies. No particular upper ontology has yet gained widespread acceptance as a de facto standard. Different organizations have attempted to define standards for specific domains. The 'Process Specification Language' (PSL) created by the National Institute of Standards and Technology (NIST) is one example. Another important factor leading to the absence of wide adoption of any existing upper ontology is the complexity. Some upper ontologies—Cyc is often cited as an example in this regard—are very large, ranging up to thousands of elements (classes, relations), with complex interactions among them and with a complexity similar to that of a human natural language, and the learning process can be even longer than for a natural language because of the unfamiliar format and logical rules. The motivation to overcome this learning barrier is largely absent because of the paucity of publicly accessible examples of use. As a result, those building domain ontologies for local applications tend to create the simplest possible domain-specific ontology, not related to any upper ontology. Such domain ontologies may function adequately for the local purpose, but they are very time-consuming to relate accurately to other domain ontologies. To solve this problem, some genuinely top level ontologies have been developed, which are deliberately designed to have minimal overlap with any domain ontologies. Examples are Basic Formal Ontology and the DOLCE (see below). === Arguments for the infeasibility of an upper ontology === Historically, many attempts in many societies have been made to impose or define a single set of concepts as more primal, basic, foundational, authoritative, true or rational than all others. A common objection to such attempts points out that humans lack the sort of transcendent perspective — or God's eye view — that would be required to achieve this goal. Humans are bound by language or culture, and so lack the sort of objective perspective from which to observe the whole terrain of concepts and derive any one standard. Thomasson, under the headline "1.5 Skepticism about Category Systems", wrote: "category systems, at least as traditionally presented, seem to presuppose that there is a unique true answer to the question of what categories of entity there are – indeed the discovery of this answer is the goal of most such inquiries into ontological categories. [...] But actual category systems offered vary so much that even a short survey of past category systems like that above can undermine the belief that such a unique, true and complete system of categories may be found. Given such a diversity of answers to the question of what the ontological categories are, by what criteria could we possibly choose among them to determine which is uniquely correct?" Another objection is the problem of formulating definitions. Top level ontologies are designed to maximize support for interoperability across a large number of terms. Such ontologies must therefore consist of terms expressing very general concepts, but such concepts are so basic to our understanding that there is no way in which they can be defined, since the very process of definition implies that a less basic (and less well understood) concept is defined in terms of concepts that are more basic and so (ideally) more well understood. Very general concepts can often only be elucidated, for example by means of examples, or paraphrase. There is no self-evident way of dividing the world up into concepts, and certainly no non-controversial one There is no neutral ground that can serve as a means of translating between specialized (or "lower" or "application-specific") ontologies Human language itself is already an arbitrary approximation of just one among many possible conceptual maps. To draw any necessary correlation between English words and any number of intellectual concepts, that we might like to represent in our ontologies, is just asking for trouble. (WordNet, for instance, is successful and useful, precisely because it does not pretend to be a general-purpose upper ontology; rather, it is a tool for semantic / syntactic / linguistic disambiguation, which is richly embedded in the particulars and peculiarities of the English language.) Any hierarchical or topological representation of concepts must begin from some ontological, epistemological, linguistic, cultural, and ultimately pragmatic perspective. Such pragmatism does not allow for the exclusion of politics between persons or groups, indeed it requires they be considered as perhaps more basic primitives than any that are represented. Those who doubt the feasibility of general purpose ontologies are more inclined to ask "what specific purpose do we have in mind for this conceptual map of entities and what practical difference will this ontology make?" This pragmatic philosophical position surrenders all hope of devising the encoded ontology version of "The world is everything that is the case." (Wittgenstein, Tractatus Logico-Philosophicus). Finally, there are objections similar to those against artificial intelligence. Technically, the complex concept acquisition and the social / linguistic interactions of human beings suggest any axiomatic foundation of "most basic" concepts must be cognitive biological or otherwise difficult to characterize since we don't have axioms for such systems. Ethically, any general-purpose ontology could quickly become an actual tyranny by recruiting adherents into a political program designed to propagate it and its funding means, and possibly defend it by violence. Historically, inconsistent and irrational belief systems have proven capable of commanding obedience to the detriment or harm of persons both inside and outside a society that accepts them. How much more harmful would a consistent rational one be, were it to contain even one or two basic assumptions incompatible with human life? === Arguments for the feasibility of an upper ontology === Many of those who doubt the possibility of developing wide agreement on a common upper ontology fall into one of two traps: they assert that there is no possibility of universal agreement on any conceptual scheme; but they argue that a practical common ontology does not need to have universal agreement, it only needs a large enough user community (as is the case for human languages) to make it profitable for developers to use it as a means to general interoperability, and for third-party developer to develop utilities to make it easier to use; and they point out that developers of data schemes find different representations congenial for their local purposes; but they do not demonstrate that these different representations are in fact logically inconsistent. In fact, different representations of assertions about the real world (though not philosophical models), if they accurately reflect the world, must be logically consistent, even if they focus on different aspects of the same physical object or phenomenon. If any two assertions about the real world are logically inconsistent, one or both must be wrong, and that is a topic for experimental investigation, not for ontological representation. In practice, representations of the real world are created as and known to be approximations to the basic reality, and their use is circumscribed by the limits of e

    Read more →
  • Mobile content management system

    Mobile content management system

    A mobile content management system (MCMs) is a type of content management system (CMS) capable of storing and delivering content and services to mobile devices, such as mobile phones, smart phones, and PDAs. Mobile content management systems may be discrete systems, or may exist as features, modules or add-ons of larger content management systems capable of multi-channel content delivery. Mobile content delivery has unique, specific constraints including widely variable device capacities, small screen size, limitations on wireless bandwidth, sometimes small storage capacity, and (for some devices) comparatively weak device processors. Demand for mobile content management increased as mobile devices became increasingly ubiquitous and sophisticated. MCMS technology initially focused on the business to consumer (B2C) mobile market place with ringtones, games, text-messaging, news, and other related content. Since, mobile content management systems have also taken root in business-to-business (B2B) and business-to-employee (B2E) situations, allowing companies to provide more timely information and functionality to business partners and mobile workforces in an increasingly efficient manner. A 2008 estimate put global revenue for mobile content management at US$8 billion. == Key features == === Multi-channel content delivery === Multi-channel content delivery capabilities allow users not to manage a central content repository while simultaneously delivering that content to mobile devices such as mobile phones, smartphones, tablets and other mobile devices. Content can be stored in a raw format (such as Microsoft Word, Excel, PowerPoint, PDF, Text, HTML etc.) to which device-specific presentation styles can be applied. === Content access control === Access control includes authorization, authentication, access approval to each content. In many cases the access control also includes download control, wipe-out for specific user, time specific access. For the authentication, MCM shall have basic authentication which has user ID and password. For higher security many MCM supports IP authentication and mobile device authentication. === Specialized templating system === While traditional web content management systems handle templates for only a handful of web browsers, mobile CMS templates must be adapted to the very wide range of target devices with different capacities and limitations. There are two approaches to adapting templates: multi-client and multi-site. The multi-client approach makes it possible to see all versions of a site at the same domain (e.g. sitename.com), and templates are presented based on the device client used for viewing. The multi-site approach displays the mobile site on a targeted sub-domain (e.g. mobile.sitename.com). === Location-based content delivery === Location-based content delivery provides targeted content, such as information, advertisements, maps, directions, and news, to mobile devices based on current physical location. Currently, GPS (global positioning system) navigation systems offer the most popular location-based services. Navigation systems are specialized systems, but incorporating mobile phone functionality makes greater exploitation of location-aware content delivery possible.

    Read more →
  • VieON

    VieON

    VieON is an mobile application for television and video on demand provided by VieON Joint Stock Company (formerly Dzones), a subsidiary of DatVietVAC Media and Entertainment Group in Vietnam. The app was launched in 2020, featuring over 140 domestic and international television channels, original series, popular entertainment programs known nationwide, top-tier sports events and live streaming of major events. Additionally, VieON provides animated films, television series and television programs from various countries such as South Korea and China. == History == The application was planned for development in 2016, with the cooperation of strategic consulting partner BCG Digital Ventures from the United States. Prior to 2020, VieON was a rebranded version of VTVcab ON, a product managed by Vietnam Cable Television Corporation (VTVCab) and DatVietVAC. On June 15, 2020, after four years of research and testing, the new version of VieON was officially released by DatVietVAC Group, with Vie Channel Joint Stock Company as the business entity and service provider. This is considered the official launch date of the application. On July 21, 2023, VieON transitioned its business operations and service provision to VieON Joint Stock Company. In January 2024, VieON officially launched its global version, VieON Global, targeting Vietnamese users living abroad. == Background == According to Kantar Media Vietnam, up to 84% of Vietnamese people aged 15–54 use social media daily, and in a similar survey by Nielsen, 90% of respondents said they watch live TV weekly. Additionally, according to research organization Muvi, Southeast Asia's OTT market revenue could reach $650 million annually starting next year. Understanding this, DatVietVAC Group has planned to research and develop an OTT application, even though the Vietnamese market already has some major players such as FPT Play and the international giant Netflix. Additionally, DatVietVAC does not hide its ambition to make this application the number one entertainment channel for Vietnamese people.

    Read more →
  • Information professional

    Information professional

    The term information professional or information specialist refers to professionals responsible for the collection, documentation, organization, storage, preservation, retrieval, and dissemination of printed and digital information. The service delivered to the client is known as an information service. The term "information professional" is a versatile one, used to describe similar and sometimes overlapping professions, such as librarians, archivists, information managers, information systems specialists, information scientists, records managers, and information consultants. However, terminology differs among sources and organisations. Information professionals are employed in a variety of private, public, and academic institutions, as well as independently. == Skills == Since the term information professional is broad, the skills required for this profession are also varied. A Gartner report in 2011 pointed out that "Professional roles focused on information management will be different to that of established IT roles. An 'information professional' will not be one type of role or skill set, but will in fact have a number of specializations". Thus, an information professional can possess a variety of different skills, depending on the sector in which the person is employed. Some essential cross-sector skills are: IT skills, such as word-processing and spreadsheets, digitisation skills, and conducting Internet searches, together with skills loan systems, databases, content management systems, and specially designed programmes and packages. Customer service. An information professional should have the ability to address the information needs of customers. Language proficiency. This is essential in order to manage the information at hand and deal with customer needs. Soft skills. These include skills such as negotiating, conflict resolution, and time management. Management training. An information professional should be familiar with notions such as strategic planning and project management. Moreover, an information professional should be skilled in planning and using relevant systems, in capturing and securing information, and in accessing it to deliver service whenever the information is required. == Associations == Most countries have a professional association who oversee the professional and academic standards of librarians and other information professionals. There are also international associations related to LIS (library and information science), the most prominent of which is the International Federation of Library Associations and Institutions (IFLA). In many countries, LIS courses are accredited by the relevant professional association, as the American Library Association (ALA) in the USA, the Chartered Institute of Library and Information Professionals (CILIP) in the UK, and the Australian Library and Information Association (ALIA) in Australia. == Qualifications == Educational institutions around the world offer academic degrees, or degrees on related subjects such as Archival Studies, Information Systems, Information Management, and Records Management. Some of the institutions offering information science education refer to themselves as an iSchool, such as the CiSAP (Consortium of iSchools Asia Pacific, founded 2006) in Asia and the iSchool Caucus in the USA. There are also online e-learning resources, some of which offer certification for information professionals. === Africa === Information development in Africa started later than in other continents, mainly due to a lack of internet access, expertise and resources to manage digital infrastructure, and "opportunities for capacity development and knowledge-sharing". Nowadays, academic degrees in information studies are available at many universities of African countries, such as the University of Pretoria (South Africa), University of Nairobi (Kenya), Makerere University (Uganda), University of Botswana (Botswana), and University of Nigeria (Nigeria). === Asia === LIS-related studies are available in more than 30 Asian countries. Some examples listed by iSchools Inc. are the University of Hong Kong, University of Tsukuba, Japan, Yonsei University, South Korea, National Taiwan University and Wuhan University, China. Centre of Library and Information Management Science (CLIMS) at Tata Institute of Social Science in Mumbai, India. In Southeast Asia, the Congress of Southeast Asian Librarians (CONSAL) connects librarians and libraries in more than 10 countries with resources, networking opportunities, and support for growing library systems. === Australasia === The Australian Library and Information Association (ALIA) as of 2021 lists six schools offering undergraduate and postgraduate accredited university courses for "Librarian and Information Specialists" on their website. In New Zealand, the Open Polytechnic of New Zealand and the Victoria University of Wellington offer undergraduate and postgraduate degree courses for information professionals. === Europe === The majority of European countries have universities, colleges, or schools which offer bachelor's degrees in LIS studies. Over 40 universities offer master's degrees in LIS-related fields, and many institutions, such as the Swedish School of Library and Information Science at the University of Borås (Sweden), the University of Barcelona (Spain), Loughborough University (UK), and Aberystwyth University (Wales, UK) also offer PhD degrees. === North America === Information studies and degrees are available at numerous academic institutions throughout the U.S. and Canada. U.S. professional associations, together with their European counterparts, have undertaken many educational initiatives and pioneered many advances in the field of Information studies, such as increased interdisciplinarity and more effective delivery of distance learning. The Association for Intelligent Information Management, based in Silver Spring, Maryland, offers a qualification called Certified Information Professional (CIP), earned upon passing an examination, with certification remaining valid for three years. === South America === There are many schools and colleges in Latin America, which offer courses in Library Science, Archival Studies, and Information Studies, however these subjects are taught completely separately.

    Read more →
  • Sikidy

    Sikidy

    Sikidy is a form of algebraic geomancy practiced by Malagasy peoples in Madagascar. It involves algorithmic operations performed on random data generated from tree seeds, which are ritually arranged in a tableau called a toetry and divinely interpreted after being mathematically operated on. Columns of seeds, designated "slaves" or "princes" belonging to respective "lands" for each, interact symbolically to express vintana ('fate') in the interpretation of the diviner. The diviner also prescribes solutions to problems and ways to avoid fated misfortune, often involving a sacrifice. The centuries-old practice derives from Islamic influence brought to the island by medieval Arab traders. The sikidy is consulted for a range of divinatory questions pertaining to fate and the future, including identifying sources of and rectifying misfortune, reading the fate of newborns, and planning annual migrations. The mathematics of sikidy involves Boolean algebra, symbolic logic and parity. == History == The practice is several centuries old, and is influenced by Arab geomantic traditions of Arab Muslim traders on the island. Most writers link the origins of sikidy to the "sea-going trade involving the southwest coast of India, the Persian Gulf, and the east coast of Africa in the 9th or 10th century C.E." Stephen Ellis and Solofo Randrianja describe sikidy as "probably one of the oldest components of Malagasy culture", writing that it most likely the product of an indigenous divinatory art later influenced by Islamic practice. Umar H. D. Danfulani writes that the integration of Arabic divination into indigenous divination is "clearly demonstrated" in Madagascar, where the Arabic astrological system was adapted to the indigenous agricultural system and meshed with Malagasy lunar months by "adapting indigenous months, volana, to the astrological months, vintana". Danfulani also describes the concepts in sikidy of "houses" (lands) and "kings in their houses" as retained from medieval Arabic astrology. Chemillier et al. say the practice's spread across Madagascar likely originated with the southeastern Antemoro people, among whom Arab influence was the strongest. Though the etymology of sikidy is unknown, it has been posited that the word derives from the Arabic sichr ('incantation' or 'charm'). Sikidy was of central importance to pre-Christian Malagasy religion, with one practitioner quoted in 1892 as calling sikidy "the Bible of our ancestors". A missionary report from 1616 describes one form of sikidy using tamarind seeds, and another using fingered markings in the sand. The early colonial French governor of Madagascar Étienne de Flacourt documented sikidy in the mid-17th century: Matatane country in southeastern Madagascar [...] where the Antemoro [...] live was a center of astrological study as early as the fourteenth century [...]. This area was also the site of early Arab settlements, although strict Islamic observances were lost centuries ago [...]. Historical evidence shows that Antemoro diviners, bearers of the astrological system, infiltrated nearly all the ancient kingdoms of Madagascar beginning in the sixteenth century. [...] Today, although many persons claim to be ombiasy [diviners], only the Antemoro diviners are considered true professionals. The area is still a famous place of learning where specialists go for training and then return to their home communities with a certain body of knowledge. Now we can better understand the degree of similarity of divination forms found throughout Madagascar. For centuries Matitanana has remained a training center for diviners who have migrated widely, usually attaining important positions in their home communities and with various royal families. Comparison of contemporary rites with centuries-old texts show that sikidy has been remarkably unchanged throughout its history. The "infiltration" of Malagasy kingdoms by Antemoro diviners, and Matitanana's role as a place for astrological and divinatory learning, help to explain the relatively uniform practicing of sikidy across Madagascar. Chemallier et al. write that the mathematical construction of the arrangement of seeds is procedurally consistent across all of Madagascar, with variations in practice between groups and regions being limited to more minor aspects, such as the alignment of figures according to cardinal directions. One exception is the simplified Merina sikidy joria. === Origin myths === Mythic tradition relating to the origin of sikidy "links [the practice] both to the return by walking on water of Arab ancestors who had intermarried with Malagasy but then left, and to the names of the days of the week" and holds that the art was supernaturally communicated to the ancestors, with Zanahary (the supreme deity of Malagasy religion) giving it to Ranakandriana, who then gave it to a line of diviners (Ranakandriana to Ramanitralanana to Rabibi-andrano to Andriambavi-maitso (who was a woman) to Andriam-bavi-nosy), the last of whom terminated the monopoly by giving it to the people, declaring: "Behold, I give you the sikidy, of which you may inquire what offerings you should present in order to obtain blessings; and what expiation you should make so as to avert evils, when any are ill or under apprehension of some future calamity". A mythic anecdote of Ranakandriana says that two men observed him one day playing in the sand. In fact he was practicing a form of sikidy worked in sand called sikidy alanana. The two men seized him, and Ranakandriana promised that he would teach them something if they released him. They agreed, and Ranakandriana taught them in depth how to work the sikidy. The two men then went to their chief and told him that they could tell him "the past and the future—what was good and what was bad—what increased and what diminished." The chief asked them to tell him how he could obtain plenty of cattle. The two men worked their sikidy and told the chief to kill all of his bulls, and that "great numbers would come to him" on the following Friday. The chieftain, doubting, asked what would happen if their prediction didn't come true, and the two men promised they would pay with their lives. The chief agreed and killed his bulls. On Thursday, thinking he'd been duped, he prematurely killed the first man of the two who'd told him about the divinatory art. On Friday, however, "vast herds" came amidst heavy rain, actually filling an immense plain in their crowd. The chieftain lamented the mpisikidy's wrongful execution and ordered for him a pompous funeral. The chieftain took the second man as his close adviser and friend, and trusted the sikidy forever afterwards. The British missionary William Ellis recorded in 1839 two idiomatic expressions used in Madagascar that come from this story: "Tsy mahandry andro Zoma" (lit. 'He cannot wait 'til Friday') is said of someone extremely impatient, and heavy rainshowers falling in rapid succession are called "sese omby" (lit. 'a crowding together of cattle'). == Rites and arrangement of seeds == The divination is performed by a practitioner called an mpisikidy, ny màsina (lit. 'sacred one'), ombiasy, or ambiàsa (derived from the Arabic anbia, meaning 'prophet') who guides the client through the process and interprets the results in the context of the client's inquiries and desires. As part of an mpisikidy's formal initiation into the art, which includes a long period of apprenticeship, the initiate (called a mianatsy) must gather 124 and 200 fàno (Entada sp.) or kily (tamarind) tree seeds for his subsequent ritual use in sikidy. Raymond Decary writes that, at least among the Sakalava, a man must be 40 years old before learning and practicing sikidy, or he risks death. Before beginning to study, a student practitioner must make incisions at the tips of his index finger, his middle finger, and his tongue, and put within the incisions a paste containing red pepper and crushed wasp. This paste impregnates the fingers that will move the seeds of the sikidy and the tongue that will speak their revelations with the power to decipher the sikidy. Once this is done, he leaves at dawn to search for a fano (Entada chrysostachys) tree. Upon finding it, he throws his spear at its branches, shaking the tree and causing its large seed pods to fall. During this act, some initiates say: "When you were on the steep peak and in the dense forest, on you the crabs climbed, from you the crocodiles made their bed, with their paws the birds trod on you. Whether you are suspended in the trees or buried, you are never dried up nor rotten." In his study (written in 1941 and revised in 1948), Decary reported that the salary paid by a mianatsy to his master is "not very high": up to five francs, plus a red rooster's feather. The mpisikidy ritually arranges his seeds into a sixteen-column table consisting of four columns of randomly-generated data (representing fate) and eight columns of data derived from logical ope

    Read more →