AI Coding Book

AI Coding Book — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Immuni

    Immuni

    Immuni was an open-source COVID-19 contact tracing app used for digital contact tracing in Italy, dismissed on 31 December 2022, after a long and debated criticism for having been a failure due to the lack of trust placed by citizens. Immuni COVID-19 contact-tracing app had in fact been downloaded only by 12% of Italians between 14 and 75 years old (the government had previously stated that, in order for the app to work properly, it should have been downloaded by at least 60% of Italians). It makes use of the Apple/Google Exposure Notification system. == Development == It was developed by Bending Spoons and released by the Italian Ministry of Health on 1 June 2020. After a testing phase in 4 Italian regions (Abruzzo, Apulia, Liguria, Marche), the app started being active in the whole country on 15 June 2020. The app was initially released on App Store and Google Play, and since 1 February 2021 it is available on the Huawei AppGallery as well. === Source code === The source code was published on GitHub on the 25 May. The app only works in Italy, but compatibility with other European contact tracing apps was a goal. Since 19 October 2020 the app supports key-exchanges with the EU Interoperability Gateway and is therefore able to communicate with contact tracing apps of other EU countries. == Shutdown == As of 16 December 2020, the app was downloaded more than 10 million times, a number which increased to 21.882.502 downloads the day before the app's shutdown. On 27 December 2022 the Italian Ministry of Health announced that the app and its infrastructures will be dismissed on the 31 December of the same year.

    Read more →
  • Parchive

    Parchive

    Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operations that can repair or regenerate corrupted or missing data. Parchive was originally written to solve the problem of reliable file sharing on Usenet, but it can be used for protecting any kind of data from data corruption, disc rot, bit rot, and accidental or malicious damage. Despite the name, Parchive uses more advanced techniques (specifically error correction codes) than simplistic parity methods of error detection. As of 2015, PAR1 is obsolete, PAR2 is mature for widespread use, and PAR3 is a discontinued experimental version developed by MultiPar author Yutaka Sawada. The original SourceForge Parchive project has been inactive since April 30, 2015. A new PAR3 specification has been worked on since April 28, 2019 by PAR2 specification author Michael Nahas. An alpha version of the PAR3 specification has been published on January 29, 2022 while the program itself is being developed. == History == Parchive was intended to increase the reliability of transferring files via Usenet newsgroups. Usenet was originally designed for informal conversations, and the underlying protocol, NNTP was not designed to transmit arbitrary binary data. Another limitation, which was acceptable for conversations but not for files, was that messages were normally fairly short in length and limited to 7-bit ASCII text. Various techniques were devised to send files over Usenet, such as uuencoding and Base64. Later Usenet software allowed 8 bit Extended ASCII, which permitted new techniques like yEnc. Large files were broken up to reduce the effect of a corrupted download, but the unreliable nature of Usenet remained. With the introduction of Parchive, parity files could be created that were then uploaded along with the original data files. If any of the data files were damaged or lost while being propagated between Usenet servers, users could download parity files and use them to reconstruct the damaged or missing files. Parchive included the construction of small index files (.par in version 1 and .par2 in version 2) that do not contain any recovery data. These indexes contain file hashes that can be used to quickly identify the target files and verify their integrity. Because the index files were so small, they minimized the amount of extra data that had to be downloaded from Usenet to verify that the data files were all present and undamaged, or to determine how many parity volumes were required to repair any damage or reconstruct any missing files. They were most useful in version 1 where the parity volumes were much larger than the short index files. These larger parity volumes contain the actual recovery data along with a duplicate copy of the information in the index files (which allows them to be used on their own to verify the integrity of the data files if there is no small index file available). In July 2001, Tobias Rieper and Stefan Wehlus proposed the Parity Volume Set specification, and with the assistance of other project members, version 1.0 of the specification was published in October 2001. Par1 used Reed–Solomon error correction to create new recovery files. Any of the recovery files can be used to rebuild a missing file from an incomplete download. Version 1 became widely used on Usenet, but it did suffer some limitations: It was restricted to handle at most 255 files. The recovery files had to be the size of the largest input file, so it did not work well when the input files were of various sizes. (This limited its usefulness when not paired with the proprietary RAR compression tool.) The recovery algorithm had a bug, due to a flaw in the academic paper on which it was based. It was strongly tied to Usenet and it was felt that a more general tool might have a wider audience. In January 2002, Howard Fukada proposed that a new Par2 specification should be devised with the significant changes that data verification and repair should work on blocks of data rather than whole files, and that the algorithm should switch to using 16 bit numbers rather than the 8 bit numbers that PAR1 used. Michael Nahas and Peter Clements took up these ideas in July 2002, with additional input from Paul Nettle and Ryan Gallagher (who both wrote Par1 clients). Version 2.0 of the Parchive specification was published by Michael Nahas in September 2002. Peter Clements then went on to write the first two Par2 implementations, QuickPar and par2cmdline. Abandoned since 2004, Paul Houle created phpar2 to supersede par2cmdline. Yutaka Sawada created MultiPar to supersede QuickPar. MultiPar uses par2j.exe (which is partially based on par2cmdline's optimization techniques) to use as MultiPar's backend engine. == Versions == Versions 1 and 2 of the file format are incompatible. (However, many clients support both.) === Par1 === For Par1, the files f1, f2, ..., fn, the Parchive consists of an index file (f.par), which is CRC type file with no recovery blocks, and a number of "parity volumes" (f.p01, f.p02, etc.). Given all of the original files except for one (for example, f2), it is possible to create the missing f2 given all of the other original files and any one of the parity volumes. Alternatively, it is possible to recreate two missing files from any two of the parity volumes and so forth. Par1 supports up to a total of 256 source and recovery files. === Par2 === Par2 files generally use this naming/extension system: filename.vol000+01.PAR2, filename.vol001+02.PAR2, filename.vol003+04.PAR2, filename.vol007+06.PAR2, etc. The number after the "+" in the filename indicates how many blocks it contains, and the number after "vol" indicates the number of the first recovery block within the PAR2 file. If an index file of a download states that 4 blocks are missing, the easiest way to repair the files would be by downloading filename.vol003+04.PAR2. However, due to the redundancy, filename.vol007+06.PAR2 is also acceptable. There is also an index file filename.PAR2, it is identical in function to the small index file used in PAR1. Par2 specification supports up to 32,768 source blocks and up to 65,535 recovery blocks. Input files are split into multiple equal-sized blocks so that recovery files do not need to be the size of the largest input file. Although Unicode is mentioned in the PAR2 specification as an option, most PAR2 implementations do not support Unicode. Directory support is included in the PAR2 specification, but most or all implementations do not support it. === Par3 === The Par3 specification was originally planned to be published as an enhancement over the Par2 specification. However, to date, it has remained closed source by specification owner Yutaka Sawada. A discussion on a new format started in the GitHub issue section of the maintained fork par2cmdline on January 29, 2019. The discussion led to a new format which is also named as Par3. The new Par3 format's specification is published on GitHub, but remains being an alpha draft as of January 28, 2022. The specification is written by Michael Nahas, the author of Par2 specification, with the help from Yutaka Sawada, animetosho and malaire. The new format claims to have multiple advantages over the Par2 format, including support for: More than 216 files and more than 216 blocks. Packing small files into one block, as well as deduplication when a block appears in multiple files. UTF-8 file names. File permissions, hard links, symbolic/soft links, and empty directories. Embedding PAR data inside other formats, like ZIP archives or ISO disk images. "Incremental backups", where a user creates recovery files for some file or folder, change some data, and create new recovery files reusing some of the older files. More error correction code algorithms (such as LDPC and sparse random matrix). BLAKE3 hashes, dropping support for the MD5 hashes used in PAR2. == Software == === Multi-platform === par2+tbb (GPLv2) — a concurrent (multithreaded) version of par2cmdline 0.4 using TBB. Only compatible with x86 based CPUs. It is available in the FreeBSD Ports system as par2cmdline-tbb. Original par2cmdline — (obsolete). Available in the FreeBSD Ports system as par2cmdline. par2cmdline maintained fork by BlackIkeEagle. par2cmdline-mt is another multithreaded version of par2cmdline using OpenMP, GPLv2, or later. Currently merged into BlackIkeEagle's fork and maintained there. ParPar (CC0) is a high performance, multithreaded PAR2 client and Node.js library. Does not support verifying or repair, it can currently only create PAR2 archives. par2deep (LGPL-3.0) — Produce, verify and repair par2 files recursively, both on the command line as well as with the aid of a graphical user interface. It is available in the Python Package Index system as par2deep. par2cron (MIT License) is an o

    Read more →
  • Chinese speech synthesis

    Chinese speech synthesis

    Chinese speech synthesis is the application of speech synthesis to the Chinese language (usually Standard Chinese). It poses additional difficulties due to Chinese characters frequently having different pronunciations in different contexts and the complex prosody, which is essential to convey the meaning of words, and sometimes the difficulty in obtaining agreement among native speakers concerning what the correct pronunciation is of certain phonemes. == Concatenation (Ekho and KeyTip) == Recordings can be concatenated in any desired combination, but the joins sound forced (as is usual for simple concatenation-based speech synthesis) and this can severely affect prosody; these synthesizers are also inflexible in terms of speed and expression. However, because these synthesizers do not rely on a corpus, there is no noticeable degradation in performance when they are given more unusual or awkward phrases. Ekho is an open source TTS which simply concatenates sampled syllables. It currently supports Cantonese, Mandarin, and experimentally Korean. Some of the Mandarin syllables have been pitched-normalised in Praat. A modified version of these is used in Gradint's "synthesis from partials". cjkware.com used to ship a product called KeyTip Putonghua Reader which worked similarly; it contained 120 Megabytes of sound recordings (GSM-compressed to 40 Megabytes in the evaluation version), comprising 10,000 multi-syllable dictionary words plus single-syllable recordings in 6 different prosodies (4 tones, neutral tone, and an extra third-tone recording for use at the end of a phrase). == Lightweight synthesizers (eSpeak and Yuet) == The lightweight open-source speech project eSpeak, which has its own approach to synthesis, has experimented with Mandarin and Cantonese. eSpeak was used by Google Translate from May 2010 until December 2010. The commercial product "Yuet" is also lightweight (it is intended to be suitable for resource-constrained environments like embedded systems); it was written from scratch in ANSI C starting from 2013. Yuet claims a built-in NLP model that does not require a separate dictionary; the speech synthesised by the engine claims clear word boundaries and emphasis on appropriate words. Communication with its author is required to obtain a copy. Both eSpeak and Yuet can synthesis speech for Cantonese and Mandarin from the same input text, and can output the corresponding romanisation (for Cantonese, Yuet uses Yale and eSpeak uses Jyutping; both use Pinyin for Mandarin). eSpeak does not concern itself with word boundaries when these don't change the question of which syllable should be spoken. == Corpus-based == A "corpus-based" approach can sound very natural in most cases but can err in dealing with unusual phrases if they can't be matched with the corpus. The synthesiser engine is typically very large (hundreds or even thousands of megabytes) due to the size of the corpus. === iFlyTek === Anhui USTC iFlyTek Co., Ltd (iFlyTek) published a W3C paper in which they adapted Speech Synthesis Markup Language to produce a mark-up language called Chinese Speech Synthesis Markup Language (CSSML) which can include additional markup to clarify the pronunciation of characters and to add some prosody information. The amount of data involved is not disclosed by iFlyTek but can be seen from the commercial products that iFlyTek have licensed their technology to; for example, Bider's SpeechPlus is a 1.3 Gigabyte download, 1.2 Gigabytes of which is used for the highly compressed data for a single Chinese voice. iFlyTek's synthesiser can also synthesise mixed Chinese and English text with the same voice (e.g. Chinese sentences containing some English words); they claim their English synthesis to be "average". The iFlyTek corpus appears to be heavily dependent on Chinese characters, and it is not possible to synthesize from pinyin alone. It is sometimes possible by means of CSSML to add pinyin to the characters to disambiguate between multiple possible pronunciations, but this does not always work. === NeoSpeech === There is an online interactive demonstration for NeoSpeech speech synthesis, which accepts Chinese characters and also pinyin if it's enclosed in their proprietary "VTML" markup. === Mac OS === Mac OS had Chinese speech synthesizers available up to version 9. This was removed in 10.0 and reinstated in 10.7 (Lion). === Historical corpus-based synthesizers (no longer available) === A corpus-based approach was taken by Tsinghua University in SinoSonic, with the Harbin dialect voice data taking 800 Megabytes. This was planned to be offered as a download but the link was never activated. Nowadays, only references to it can be found on Internet Archive. Bell Labs' approach, which was demonstrated online in 1997 but subsequently removed, was described in a monograph "Multilingual Text-to-Speech Synthesis: The Bell Labs Approach" (Springer, October 31, 1997, ISBN 978-0-7923-8027-6), and the former employee who was responsible for the project, Chilin Shih (who subsequently worked at the University of Illinois) put some notes about her methods on her website.

    Read more →
  • Small Data

    Small Data

    Small Data: the Tiny Clues that Uncover Huge Trends is Martin Lindstrom's seventh book. It chronicles his work as a branding expert, working with consumers across the world to better understand their behavior. The theory behind the book is that businesses can better create products and services based on observing consumer behavior in their homes, as opposed to relying solely on big data. == Content == The book is based on a several year period of consumer studies for major corporations across the globe. It features case studies of the author's work interviewing consumers in their homes and using his observations to create hypotheses as to why they use products the way that they do. == Public reception == The book was a New York Times Bestseller upon release and was positively reviewed on several websites, Including Entrepreneur and Forbes. In 2016, it was named a Best Business Book by strategy+business and one of Inc. Magazine's Best Sales and Marketing books.

    Read more →
  • Cybernetic Serendipity

    Cybernetic Serendipity

    Cybernetic Serendipity was an exhibition of cybernetic art curated by Jasia Reichardt, shown at the Institute of Contemporary Arts, London, England, from 2 August to 20 October 1968, and then toured across the United States. Two stops in the United States were the Corcoran Annex (Corcoran Gallery of Art), Washington, D.C., from 16 July to 31 August 1969, and the newly opened Exploratorium in San Francisco, from 1 November to 18 December 1969. == Content == One part of the exhibition was concerned with algorithms and devices for generating music. Some exhibits were pamphlets describing the algorithms, whilst others showed musical notation produced by computers. Devices made musical effects and played tapes of sounds made by computers. Peter Zinovieff lent part of his studio equipment - visitors could sing or whistle a tune into a microphone and his equipment would improvise a piece of music based on the tune. Another part described computer projects such as Gustav Metzger's self-destructive Five Screens With Computer, a design for a new hospital, a computer programmed structure, and dance choreography. The machines and installations were a very noticeable part of the exhibition. Gordon Pask produced a collection of large mobiles (Colloquy of Mobiles (1968)) with interacting parts that let the viewers join in the conversation. Many machines formed kinetic environments or displayed moving images. Bruce Lacey contributed his radio-controlled robots and a light-sensitive owl. Nam June Paik was represented by Robot K-456 and televisions with distorted images. Jean Tinguely provided two of his painting machines. Edward Ihnatowicz's biomorphic hydraulic ear (Sound Activated Mobile (SAM, 1968)) turned toward sounds and John Billingsley's Albert 1967 turned to face light. Wen-Ying Tsai presented his interactive cybernetic sculptures of vibrating stainless-steel rods, stroboscopic light, and audio feedback control. Several artists exhibited machines that drew patterns that the visitor could take away, or involved visitors in games. Cartoonist Rowland Emett designed the mechanical computer Forget-me-not, which was commissioned by Honeywell. Another section explored the computer's ability to produce text - both essays and poetry. Different programs produced Haiku, children's stories, and essays. One of the first computer-generated poems, by Alison Knowles and James Tenney, was included in the exhibition and catalogue. Computer-generated movies were represented by John Whitney's Permutations and a Bell Labs movie on their technology for producing movies. Some samples included images of tesseracts rotating in four dimensions, a satellite orbiting the Earth, and an animated data structure. Computer graphics were also represented, including pictures produced on cathode ray oscilloscopes and digital plotters. There was a variety of posters and graphics demonstrating the power of computers to do complex (and apparently random) calculations. Other graphics showed a simulated Mondrian and the iconic decreasing squares spiral that appeared on the exhibition's poster and book. The Boeing Company exhibited their use of wireframe graphics. The innovative computer-generated sculpture, Quad 1, was displayed at the Cybernetic Serendipity exhibit. Created by the American abstract expressionist sculptor, Robert Mallary, in 1968, Quad 1 is widely believed to be the world's first Computer Aided Design sculpture. Keith Albarn & Partners contributed to the design of the exhibition. Reflecting the prominence of music in the show, a ten-track album Cybernetic Serendipity Music was released by the ICA to accompany the show. Artists featured included Iannis Xenakis, John Cage, and Peter Zinovieff, a detail of whose graphic score for 'Four Sacred April Rounds’ (1968) was used as the cover artwork. == Attendance == Time magazine noted that there had been 40,000 visitors to the London exhibition. Other reports suggested visitor numbers were as high as 44,000 to 60,000. However, the ICA did not accurately count visitors. == After-effects == The exhibition provided the energy for the formation of British Computer Arts Society which continued to explore the interaction between science, technology and art, and put on exhibitions (for example Event One at the Royal College of Art). Several pieces were purchased by the Exploratorium in 1971, some of which are on display to this day. In 2014 the ICA held a retrospective exhibition Cybernetic Serendipity: A Documentation which included documents, installation photographs, press reviews and publications and a series of discussions in one of which Peter Zinovieff took part. To coincide with the exhibition, Cybernetic Serendipity Music was re-released as a limited-edition vinyl LP by The Vinyl Factory. The Victoria and Albert Museum marked the 50th anniversary with an exhibition in 2018 entitled "Chance and Control: Art in the Age of Computers". The V&A exhibition included many works by artists who featured in the original ICA show, plus related ephemera. "Chance and Control" subsequently toured to Chester Visual Arts and Firstsite, Colchester. In 2020, The Centre Pompidou exhibited the replica of Gordon Pask's 1968 Colloquy of Mobiles, reproduced by Paul Pangaro and TJ McLeish in 2018. In 2022 the Australian National University's School of Cybernetics launched the school by presenting an exhibition Australian Cybernetic: a point through time. The exhibition included works from Cybernetic Serendipity (1968), Australia ‘75: Festival of Creative Arts and Science (1975), and contemporary pieces curated by the School of Cybernetics. In describing Reichardt's Cybernetic Serendipity exhibition the school stated that it "represented points of expanding the cybernetic imagination" and was a "ground-breaking" "glimpse of a future in which computers were entangled with people and cultures, and through this she fashioned a blueprint for the future of computing that has since inspired generations".

    Read more →
  • Reference data

    Reference data

    Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time. Examples of reference data include: Units of measurement Country codes Corporate codes Fixed conversion rates e.g., weight, temperature, and length Calendar structure and constraints Reference data sets are sometimes alternatively referred to as a "controlled vocabulary" or "lookup" data. Reference data differs from master data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. A further difference between reference data and master data is that a change to the reference data values may require an associated change in business process to support the change, while a change in master data will always be managed as part of existing business processes. For example, adding a new customer or sales product is part of the standard business process. However, adding a new product classification (e.g. "restricted sales item") or a new customer type (e.g. "gold level customer") will result in a modification to the business processes to manage those items. == Externally-defined reference data == For most organisations, most or all reference data is defined and managed within that organisation. Some reference data, however, may be externally defined and managed, for example by standards organizations. An example of externally defined reference data is the set of country codes as defined in ISO 3166-1. == Reference data management == Curating and managing reference data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's reference data. Without consistency across business process or applications, for example, similar things may be described in quite different ways. Reference data gain in value when they are widely re-used and widely referenced. Examples of good practice in reference data management include: Formalize the reference data management Use external reference data as much as possible Govern the reference data specific to your enterprise Manage reference data at enterprise level Version control your reference data

    Read more →
  • Artificial intelligence in marketing

    Artificial intelligence in marketing

    Artificial intelligence marketing (AI marketing) is a form of marketing that uses artificial intelligence concepts and models such as machine learning, natural language processing, and computer vision to achieve marketing goals. The main difference between AI marketing and traditional forms of marketing reside in the reasoning, which is performed through a computer algorithm rather than a human. Each form of marketing has a different technique to the core of the marketing theory. Traditional marketing directly focuses on the needs of consumers; meanwhile some believe the shift AI may cause will lead marketing agencies to manage consumer needs instead. AI is used in various digital marketing spaces, such as content marketing, email marketing, online advertisement (in combination with machine learning), social media marketing, affiliate marketing, and beyond. == Historical development == AI in marketing has a long history, which goes all the way back to the 1980s. At this time, AI research was focusing on expert systems and robotics. Despite the initial research and the studies that were carried out, AI adoption remained limited. Research on it came to a stop for a while, until research was revived two decades later with the advancement in technology, the rise of big data, and a significant increase in computational power. Eventually, AI became very popular in the marketing world, and caught the eyes of many researchers as well as professionals. A large‐scale bibliometric study covering 1,580 peer‑reviewed papers published between 1982 and 2020 confirms that scholarly output on AI in marketing has surged since 2017, with Expert Systems with Applications emerging as the most prolific outlet. Prior to the application of artificial Intelligence in marketing, there was something called "collaborative filtering". This was used as early as 1998 by Amazon, and one of the first ways companies predicted consumer behavior, which enabled millions of recommendations to different customers. Personalized recommender systems are now widely used, for example to suggest music on Spotify, or TV shows on Netflix. A big milestone in AI marketing happened in 2014, when programmatic ad buying gained much greater popularity. Marketing consists of numerous manual tasks such as researching target markets, insertion orders, and managing high budgets as well as prices. In order to cut costs, and remove the need for these tedious tasks, many companies started to automate the marketing process with AI. In 2015, Google introduced RankBrain, a machine learning component of its search algorithm designed to interpret the intent behind user queries. RankBrain was followed by further AI-based search updates, including BERT in 2019, which improved the understanding of conversational queries, and the Multitask Unified Model (MUM) in 2021, which is multimodal and processes information across 75 languages. These advances shifted search engine optimization practice away from keyword matching toward content that satisfies user intent. Artificial intelligence is increasingly used in marketing to personalize user experiences and automate decision-making. For example, Netflix uses AI algorithms to recommend content based on viewing history, while Sephora employs chatbots to assist customers with product selection and availability. Programmatic advertising platforms like Google Ads leverage machine learning to optimize bidding strategies and target audiences more effectively. These applications demonstrate how AI enhances efficiency, engagement, and conversion rates across digital channels. === Artificial neural networks === An artificial neural network is a form of computer program modeled on the brain and nervous system of humans. Neural networks are composed of a series of interconnected processing neurons that function in unison to achieve certain outcomes. Using “human-like trial and error learning methods neural networks detect patterns existing within a data set ignoring data that is not significant while emphasizing the data which is most influential”. From a marketing perspective, neural networks are a form of software tool used to assist in decision making. Neural networks are effective in gathering and extracting information from large data sources and have the ability to identify cause and effect within tha data. These neural nets through the process of learning, identify relationships and connections between databases. Once knowledge has been accumulated, neural networks can be relied on to provide generalizations and can apply past knowledge and learning to a variety of situations. Neural networks help fulfill the role of marketing companies through effectively aiding in market segmentation and measurement of performance while reducing costs and improving accuracy. Due to their learning ability, flexibility, adaption, and knowledge discovery, neural networks offer many advantages over traditional models. Neural networks can be used to assist in pattern classification, forecasting and marketing analysis. == Tools and uses == Classification of customers can be facilitated through the neural network approach allowing companies to make informed marketing decisions. An example of this was employed by Spiegel Inc., a firm dealing in direct-mail operations that used neural networks to improve efficiencies. Using software developed by NeuralWare Inc., Spiegel identified the demographics of customers who had made a single purchase and those customers who had made repeat purchases. Neural networks where then able to identify the key patterns and consequently identify the customers that were most likely to repeat purchase. Understanding this information allowed Spiegel to streamline marketing efforts, and reduced costs. Sales forecasting “is the process of estimating future events with the goal of providing benchmarks for monitoring actual performance and reducing uncertainty". Artificial intelligence techniques have emerged to facilitate the process of forecasting through increasing accuracy in the areas of demand for products, distribution, employee turnover, performance measurement, and inventory control. An example of forecasting using neural networks is the Airline Marketing Assistant/Tactician; an application developed by BehabHeuristics which allows for the forecasting of passenger demand and consequent seat allocation through neural networks. This system has been used by National air Canada and USAir. Neural networks provide a useful alternative to traditional statistical models due to their reliability, time-saving characteristics and ability to recognize patterns from incomplete or noisy data. Examples of marketing analysis systems includes the Target Marketing System developed by Churchull Systems for Veratex Corporation. This support system scans a market database to identify dormant customers allowing management to make decisions regarding which key customers to target. When performing marketing analysis, neural networks can assist in the gathering and processing of information ranging from consumer demographics and credit history to the purchase patterns of consumers. Predictive analytics is a form of analytics involving the use of historical data and artificial intelligence algorithms to predict future trends and outcomes. It serves as a tool for anticipating and understanding user behavior based on patterns found in data. Predictive analytics uses artificial intelligence machine learning algorithms to recognize and predict patterns within data. Machine learning algorithms analyze the data, recognize patterns, and make predictions through continuous learning and adaptation. Predictive analytics is widely used across businesses and industries as a way to identify opportunities, avoid risks, and anticipate customer needs based on information derived from the analysis of user data. By analyzing historical customer data, artificial intelligence algorithms can deliver relevant and targeted marketing content. Recent systematic reviews show that generative large‑language models such as GPT‑3 and GPT‑4 are now routinely embedded in predictive‑analytics pipelines to mine unstructured market data and anticipate customer intent with greater precision. Personalization engines use artificial intelligence and machine learning to provide content or advertisements that are relevant to the user. User data is gathered, which then gets processed with machine learning, and patterns and trends among the users are identified. Users with shared characteristics or behaviors are then segmented into groups, and the personalization engine adjusts content and advertisements to match each segment's preferences. By processing a large amount of data, personalization engines are able to match users to advertisements and recommendations that align with their interests or preferences. Field evidence from consumer‑goods and electronics firms indicates that AI‑driven personalization can raise

    Read more →
  • Master data

    Master data

    Master data represents "data about the business entities that provide context for business transactions". The most commonly found categories of master data are parties (individuals and organisations, and their roles, such as customers, suppliers, employees), products, financial structures (such as ledgers and cost centres) and locational concepts. Master data should be distinguished from reference data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. Master data is, by its nature, almost always non-transactional in nature. There exist edge cases where an organization may need to treat certain transactional processes and operations as "master data". This arises, for example, where information about master data entities, such as customers or products, is only contained within transactional data such as orders and receipts and is not housed separately. ISO 8000 is the international standard for data quality and data portability in master data. == Alternative definition == An alternative definition of the term master data is that it represents the business objects that contain the most valuable, agreed upon information shared across an organization. In this sense, it gives context to business activities and transactions, answering questions like who, what, when and how as well as expanding the ability to make sense of these activities through categorizations, groupings and hierarchies. It can cover relatively static reference data, transactional, unstructured, analytical, hierarchical and metadata. What constitutes master data under this definition is therefore not about an essential quality of the data (e.g. it is a business entity that provides context for business transactions), but rather about the context in which the organisation has decided to treat the data. == Externally-defined master data == For most organisations, most or all master data is defined and managed within that organisation. Some master data, however, may be externally defined and managed. This represents the single source of basic business data used across a marketplace, regardless of organisation or location. Thus, it can be used by multiple enterprises within a value chain, facilitating "integration of multiple data sources and literally [putting] everyone in the market on the same page." An example of market master data is the Universal Product Code (UPC) found on consumer products. == Master data management == Curating and managing master data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's master data. Master Data is therefore the focus of the information technology (IT) discipline of master data management (MDM). Without this discipline in place, organisations commonly encounter difficulties with having multiple versions of "the truth" about a business entity, both within individual applications, and distributed across applications.

    Read more →
  • Scientific Working Group – Imaging Technology

    Scientific Working Group – Imaging Technology

    The Scientific Working Group on Imaging Technology was convened by the Federal Bureau of Investigation in 1997 to provide guidance to law enforcement agencies and others in the criminal justice system regarding the best practices for photography, videography, and video and image analysis. This group was terminated in 2015. == History == As technology has advanced through the years, law enforcement has needed to stay abreast of emerging technological advances and use these in the investigation of crime. A factor that is considered when new technology is used in these investigations is the determination of whether the use of that new technology will be admissible in court. The judicial system in the United States currently has two standards used in the determination of admissibility of testimony regarding scientific evidence; the Daubert Standard and the Frye Standard. These standards guide the courts in the admissibility of testimony derived from the use of new technologies and scientific techniques. The Federal Bureau of Investigation (FBI), seeking to address possible admissibility issues with such testimony, established Scientific Working Groups starting with the Scientific Working Group on DNA Analysis and Methods (SWGDAM) in 1988. The goal of these groups is to open lines of communication between law enforcement agencies and forensic laboratories around the world while providing guidance on the use of new and innovative technologies and techniques. This guidance can lead to admissibility of evidence and/or testimony, provided proper methods in the collection of evidence and its analysis are employed. In 2009, the National Academy of Sciences released a report entitled, "Strengthening Forensic Science in the United States: A Path Forward." This report addresses many topics including challenges and disparities facing the forensic science community, standardization, certification of practitioners and accreditation of their respective entities, problems related to the interpretation of forensic evidence, the need for research, and the admission of forensic science evidence in litigation. This report mentions the Scientific Working Groups and their role in forensic science. The history of imaging technology (photography) can be said to extend back to the times of Chinese philosopher Mo-Ti (470-390 B.C.) who described the principles behind the precursor to the camera obscura. Since that time, advances in imaging technology include the discovery of chemical photographic processes in the 19th century and the use of electronic imaging technology that includes analog video cameras and digital video and still cameras. By the mid 1990s, it was apparent that technologically advanced camera systems such as these were being adopted for use in the criminal justice system. This led the FBI to convene a meeting of individuals working in the field of forensic imaging from federal, state, local, and foreign law enforcement, and the U.S. military, during the summer of 1997. As a result of this meeting, the Technical Working Group on Imaging Technology was formed from a core group of the meeting’s participants. This group later became the Scientific Working Group on Imaging Technology (SWGIT). Prior to the inception of SWGIT, some law enforcement agencies began adopting digital imaging technology. Due to the lack of guidelines or standards, some of these agencies attempted to replace all their film cameras with substandard digital cameras, only to find that the equipment they had purchased was not capable of accomplishing the mission for which they were intended. At that time only low resolution digital cameras were deemed affordable by some law enforcement agencies. Some of these agencies were forced to rethink their photography procedures and reverted to the use of film cameras or replaced their low-resolution digital cameras with higher quality, more expensive equipment. Also lacking at this early stage was guidance on how to store and archive digital image files. When SWGIT was formed, it was tasked with providing guidance to law enforcement and others in the criminal justice system by releasing documents that describe the best practices and guidelines for the use of imaging technology, to include these concerns and many others. This group was terminated in 2015. == SWGIT Function == During its existence, SWGIT provided information on the appropriate use of various imaging technologies including both established and new. This was accomplished through the release of documents such as the SWGIT Best Practices documents. As changes in technology occurred, these documents were updated. Over the course of its existence, SWGIT collaborated with other Scientific Working Groups to address imaging concerns within their respective disciplines. SWGIT published over 20 documents that dealt specifically with imaging technology. SWGIT also co-published documents with the Scientific Working Group on Digital Evidence (SWGDE) that had a component or components dealing with imaging technology. SWGIT also provided imaging technology guidance and input for documents from the Scientific Working Group on Friction Ridge Analysis, Study and Technology (SWGFAST), the Scientific Working Group for Forensic Document Examination (SWGDOC), and the Scientific Working Group on Shoeprint and Tire Tread Evidence (SWGTREAD). SWGIT assisted the American Society of Crime Lab Directors/Laboratory Accreditation Board (ASCLD/LAB) in the writing of definitions and standards for the accreditation of Digital and Multimedia Evidence sections of crime laboratories. In addition to releasing documents, SWGIT members disseminated best practices for law enforcement professionals where imaging technology was concerned. This was carried out by attending and lecturing at meetings and conferences of various forensic organizations that included: The American Academy of Forensic Sciences (AAFS) The International Association for Identification (IAI) The Law Enforcement and Emergency Services Video Association (LEVA) The American Society of Crime Lab Directors (ASCLD) The SWGIT membership consisted of approximately fifty scientists, photographers, instructors, and managers from more than two dozen federal, state, and local law enforcement agencies, as well as from the academic and research communities. The membership elected its officers from within. SWGIT was composed of the Executive Committee, four standing subcommittees, and ad hoc subcommittees appointed on an as-needed basis. The standing subcommittees were: Image Analysis, Forensic Photography, Video, and Outreach. This group was terminated in 2015. == Legal Proceedings == The following court cases have conducted Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579 (1993) hearings in which SWGIT best practice documents have been cited as accepted protocol, methodology, and as generally accepted techniques in the forensic community: U. S. v. Rudy Frabizio, U.S. District Court, Boston, MA, 2008 (Image Authentication) U.S. v. Nobumochi Furukawa, U.S. District Court, Minnesota, 2007 (Video Authentication) U.S. v. John Stroman, U.S. District Court, South Carolina, 2007 (Facial Comparison Analysis) State of Texas v. Daniel Day, Tarrant County Texas, 2005 (Camera Identification to Images) U.S. v. Marc Watzman, U.S. District Court, Northern Illinois, 2004 (Video Authentication) U.S. v. McKreith, U.S. District Court, Fort Lauderdale, FL, 2002 (Photo comparison of shirt) == Termination == This group was unfunded by the FBI in 2015.

    Read more →
  • Conceptualization (information science)

    Conceptualization (information science)

    In information science, a conceptualization is an abstract simplified view of some selected parts of the world, containing the objects, concepts, and other entities that are presumed of interest for some particular purpose and the relationships between them. An explicit specification of a conceptualization is an ontology, and it may occur that a conceptualization can be realized by several distinct ontologies. An ontological commitment in describing ontological comparisons is taken to refer to that subset of elements of an ontology shared with all the others. "An ontology is language-dependent", its objects and interrelations described within the language it uses, while a conceptualization is always the same, more general, its concepts existing "independently of the language used to describe it". The relation between these terms is shown in the figure to the right. Not all workers in knowledge engineering use the term "conceptualization", but instead refer to the conceptualization itself, or to the ontological commitment of all its realizations, as an overarching ontology. == Purpose and implementation == As a higher level abstraction, a conceptualization facilitates the discussion and comparison of its various ontologies, facilitating knowledge sharing and reuse. Each ontology based upon the same overarching conceptualization maps the conceptualization into specific elements and their relationships. The question then arises as to how to describe the "conceptualization" in terms that can encompass multiple ontologies. This issue has been called the Tower of Babel problem, that is, how can persons used to one ontology talk with others using a different ontology? This problem is easily grasped, but a general resolution is not at hand. It can be a "bottom-up" or a "top-down" approach, or something in between. However, in more artificial situations, such as information systems, the idea of a "conceptualization" and the "ontological commitment" of various ontologies that realize the "conceptualization" is possible. The formation of a conceptualization and its ontologies involves these steps: specification of the conceptualization ontology concepts: every definition involves the definitions of other terms relationships between the concepts: this step maps conceptual relationships onto the ontology structure groups of concepts: this step may lead to the creation of sub-ontologies formal description of ontology commitments, for example, to make them computer readable An example of moving conception into a language leading to a variety of ontologies is the expression of a process in pseudocode (a strictly structured form of ordinary language) leading to implementation in several different formal computer languages like Lisp or Fortran. The pseudocode makes it easier to understand the instructions and compare implementations, but the formal languages make possible the compilation of the ideas as computer instructions. Another example is mathematics, where a very general formulation (the analog of a conceptualization) is illustrated with "applications" that are more specialized examples. For instance, aspects of a function space can be illustrated using a vector space or a topological space that introduce interpretations of the "elements" of the conceptualization and additional relationships between them but preserve the connections required in the function space.

    Read more →
  • Semantic translation

    Semantic translation

    Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model. Semantic translation takes advantage of semantics that associate meaning with individual data elements in one dictionary to create an equivalent meaning in a second system. An example of semantic translation is the conversion of XML data from one data model to a second data model using formal ontologies for each system such as the Web Ontology Language (OWL). This is frequently required by intelligent agents that wish to perform searches on remote computer systems that use different data models to store their data elements. The process of allowing a single user to search multiple systems with a single search request is also known as federated search. Semantic translation should be differentiated from data mapping tools that do simple one-to-one translation of data from one system to another without actually associating meaning with each data element. Semantic translation requires that data elements in the source and destination systems have "semantic mappings" to a central registry or registries of data elements. The simplest mapping is of course where there is equivalence. There are three types of Semantic equivalence: Class Equivalence - indicating that class or "concepts" are equivalent. For example: "Person" is the same as "Individual" Property Equivalence - indicating that two properties are equivalent. For example: "PersonGivenName" is the same as "FirstName" Instance Equivalence - indicating that two individual instances of objects are equivalent. For example: "Dan Smith" is the same person as "Daniel Smith" Semantic translation is very difficult if the terms in a particular data model do not have direct one-to-one mappings to data elements in a foreign data model. In that situation, an alternative approach must be used to find mappings from the original data to the foreign data elements. This problem can be alleviated by centralized metadata registries that use the ISO-11179 standards such as the National Information Exchange Model (NIEM).

    Read more →
  • Savepoint

    Savepoint

    A savepoint is a way of implementing subtransactions (also known as nested transactions) within a relational database management system by indicating a point within a transaction that can be "rolled back to" without affecting any work done in the transaction before the savepoint was created. Multiple savepoints can exist within a single transaction. Savepoints are useful for implementing complex error recovery in database applications. If an error occurs in the midst of a multiple-statement transaction, the application may be able to recover from the error (by rolling back to a savepoint) without needing to abort the entire transaction. A savepoint can be declared by issuing a SAVEPOINT name statement. All changes made after a savepoint has been declared can be undone by issuing a ROLLBACK TO SAVEPOINT name command. Issuing RELEASE SAVEPOINT name will cause the named savepoint to be discarded, but will not otherwise affect anything. Issuing the commands ROLLBACK or COMMIT will also discard any savepoints created since the start of the main transaction. Savepoints are defined in the SQL standard and are supported by all established SQL relational databases, including PostgreSQL, Oracle Database, Microsoft SQL Server, MySQL, IBM Db2, SQLite (since 3.6.8), Firebird, H2 Database Engine, and Informix (since version 11.50xC3).

    Read more →
  • Phase stretch transform

    Phase stretch transform

    Phase stretch transform (PST) is a computational approach to signal and image processing. One of its utilities is for feature detection and classification. PST is related to time stretch dispersive Fourier transform. It transforms the image by emulating propagation through a diffractive medium with engineered 3D dispersive property (refractive index). The operation relies on symmetry of the dispersion profile and can be understood in terms of dispersive eigenfunctions or stretch modes. PST performs similar functionality as phase-contrast microscopy, but on digital images. PST can be applied to digital images and temporal (time series) data. It is a physics-based feature engineering algorithm. == Operation principle == Here the principle is described in the context of feature enhancement in digital images. The image is first filtered with a spatial kernel followed by application of a nonlinear frequency-dependent phase. The output of the transform is the phase in the spatial domain. The main step is the 2-D phase function which is typically applied in the frequency domain. The amount of phase applied to the image is frequency dependent, with higher amount of phase applied to higher frequency features of the image. Since sharp transitions, such as edges and corners, contain higher frequencies, PST emphasizes the edge information. Features can be further enhanced by applying thresholding and morphological operations. PST is a pure phase operation whereas conventional edge detection algorithms operate on amplitude. == Physical and mathematical foundations of phase stretch transform == Photonic time stretch technique can be understood by considering the propagation of an optical pulse through a dispersive fiber. By disregarding the loss and non-linearity in fiber, the non-linear Schrödinger equation governing the optical pulse propagation in fiber upon integration reduces to: E o ( z , t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( 0 , ω ) ⋅ e − i β 2 z ω 2 2 ⋅ e i ω t d ω {\displaystyle E_{o}(z,t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(0,\omega )\cdot e^{\frac {-i\beta _{2}z\omega ^{2}}{2}}\cdot e^{i\omega {t}}\,d\omega } (1) where β 2 {\displaystyle \beta _{2}} = GVD parameter, z is propagation distance, E o ( z , t ) {\displaystyle E_{o}(z,t)} is the reshaped output pulse at distance z and time t. The response of this dispersive element in the time-stretch system can be approximated as a phase propagator as presented in H ( ω ) = e i φ ( ω ) = e i ∑ m = 0 ∞ φ m ( ω ) = ∏ m = 0 ∞ H m ( ω ) {\displaystyle H(\omega )=e^{i\varphi (\omega )}=e^{i\sum _{m=0}^{\infty }\varphi _{m}(\omega )}=\prod _{m=0}^{\infty }H_{m}(\omega )} (2) Therefore, Eq. 1 can be written as following for a pulse that propagates through the time-stretch system and is reshaped into a temporal signal with a complex envelope given by E o ( t ) = 1 2 π ∫ − ∞ ∞ E ~ i ( ω ) ⋅ H ( ω ) ⋅ e i ω t d ω {\displaystyle E_{o}(t)={\frac {1}{2\pi }}\int _{-\infty }^{\infty }{\tilde {E}}_{i}(\omega )\cdot H(\omega )\cdot e^{i\omega t}\,d\omega } (3) The time stretch operation is formulated as generalized phase and amplitude operations, S { E i ( t ) } = ∫ − ∞ + ∞ F { E i ( t ) } ⋅ e i φ ( ω ) ⋅ L ~ ( ω ) ⋅ e i ω t d ω {\displaystyle \mathbb {S} \{E_{i}(t)\}=\int _{-\infty }^{+\infty }{\mathcal {F}}\{E_{i}(t)\}\cdot e^{i\varphi (\omega )}\cdot {\tilde {L}}(\omega )\cdot e^{i\omega {t}}d\omega } (4) where e i φ ( ω ) {\displaystyle e^{i\varphi (\omega )}} is the phase filter and L ~ ( ω ) {\displaystyle {\tilde {L}}(\omega )} is the amplitude filter. Next the operator is converted to discrete domain, S { E i [ n ] } = 1 N ∑ u = 0 N − 1 F F T { E i ( n ) } ⋅ K ~ ( u ) ⋅ L ~ ( u ) ⋅ e i 2 π N u n {\displaystyle \mathbb {S} \{E_{i}[n]\}={\frac {1}{N}}\sum _{u=0}^{N-1}FFT\{E_{i}(n)\}\cdot {\tilde {K}}(u)\cdot {\tilde {L}}(u)\cdot e^{i{\frac {2\pi }{N}}un}} (5) where u {\displaystyle u} is the discrete frequency, K ~ ( u ) {\displaystyle {\tilde {K}}(u)} is the phase filter, L ~ ( u ) {\displaystyle {\tilde {L}}(u)} is the amplitude filter and FFT is fast Fourier transform. The stretch operator S { } {\displaystyle \mathbb {S} \{\}} for a digital image is then S { E i [ n , m ] } = 1 M N ∑ v = 0 N − 1 ∑ u = 0 M − 1 F F T 2 { E i ( n , m ) } ⋅ K ~ ( u , v ) ⋅ L ~ ( u , v ) ⋅ e i 2 π M u m ⋅ e i 2 π N v n {\displaystyle \mathbb {S} \{E_{i}[n,m]\}={\frac {1}{MN}}\sum _{v=0}^{N-1}\sum _{u=0}^{M-1}FFT^{2}\{E_{i}(n,m)\}\cdot {\tilde {K}}(u,v)\cdot {\tilde {L}}(u,v)\cdot e^{i{\frac {2\pi }{M}}um}\cdot e^{i{\frac {2\pi }{N}}vn}} (6) In the above equations, E i [ n , m ] {\displaystyle E_{i}[n,m]} is the input image, n {\displaystyle n} and m {\displaystyle m} are the spatial variables, F F T 2 {\displaystyle FFT^{2}} is the two-dimensional fast Fourier transform, and u {\displaystyle u} and v {\displaystyle v} are spatial frequency variables. The function K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} is the warped phase kernel and the function L ~ ( u , v ) {\displaystyle {\tilde {L}}(u,v)} is a localization kernel implemented in frequency domain. PST operator is defined as the phase of the Warped Stretch Transform output as follows P S T { E i [ n , m ] } ≜ ∡ { S { E i [ x , y ] } } {\displaystyle PST\{E_{i}[n,m]\}\triangleq \measuredangle \{\mathbb {S} \{E_{i}[x,y]\}\}} (7) where ∡ { } {\displaystyle \measuredangle \{\}} is the angle operator. == PST kernel implementation == The warped phase kernel K ~ ( u , v ) {\displaystyle {\tilde {K}}(u,v)} can be described by a nonlinear frequency dependent phase K ~ ( u , v ) = e i φ ( u , v ) {\displaystyle {\tilde {K}}(u,v)=e^{i\varphi (u,v)}} While arbitrary phase kernels can be considered for PST operation, here we study the phase kernels for which the kernel phase derivative is a linear or sublinear function with respect to frequency variables. A simple example for such phase derivative profiles is the inverse tangent function. Consider the phase profile in the polar coordinate system φ ( u , v ) = φ polar ( r , θ ) = φ polar ( r ) {\displaystyle \varphi (u,v)=\varphi _{\text{polar}}(r,\theta )=\varphi _{\text{polar}}(r)} From d φ ( r ) d r = tan − 1 ⁡ ( r ) {\displaystyle {\frac {d\varphi (r)}{dr}}=\tan ^{-1}(r)} we have φ ( r ) = r tan − 1 ⁡ ( r ) − 1 2 log ⁡ ( r 2 + 1 ) {\displaystyle \varphi (r)=r\tan ^{-1}(r)-{\frac {1}{2}}\log(r^{2}+1)} Therefore, the PST kernel is implemented as φ ( r ) = S ⋅ ( W r ) ⋅ tan − 1 ⁡ ( W r ) − 1 2 log ⁡ ( 1 + ( W r ) 2 ) ( W r max ) ⋅ tan − 1 ⁡ ( W r max ) − 1 2 log ⁡ ( 1 + ( W r max ) 2 ) {\displaystyle \varphi (r)=S\cdot {\frac {(Wr)\cdot \tan ^{-1}(Wr)-{\frac {1}{2}}\log(1+(Wr)^{2})}{(Wr_{\max })\cdot \tan ^{-1}(Wr_{\max })-{\frac {1}{2}}\log(1+(Wr_{\max })^{2})}}} where S {\displaystyle S} and W {\displaystyle W} are real-valued numbers related to the strength and warp of the phase profile == Applications == PST has been used for edge detection in biological and biomedical images as well as synthetic-aperture radar (SAR) image processing, as well as detail and feature enhancement for digital images. PST has also been applied to improve the point spread function for single molecule imaging in order to achieve super-resolution. The transform exhibits intrinsic superior properties compared to conventional edge detectors for feature detection in low contrast visually impaired images. The PST function can also be performed on 1-D temporal waveforms in the analog domain to reveal transitions and anomalies in real time. == Open source code release == On February 9, 2016, a UCLA Engineering research group has made public the computer code for PST algorithm that helps computers process images at high speeds and "see" them in ways that human eyes cannot. The researchers say the code could eventually be used in face, fingerprint, and iris recognition systems for high-tech security, as well as in self-driving cars' navigation systems or for inspecting industrial products. The Matlab implementation for PST can also be downloaded from Matlab Files Exchange. However, it is provided for research purposes only, and a license must be obtained for any commercial applications. The software is protected under a US patent. The code was then significantly refactored and improved to support GPU acceleration. In May 2022, it became one algorithm in PhyCV: the first physics-inspired computer vision library.

    Read more →
  • Enterprise Objects Framework

    Enterprise Objects Framework

    The Enterprise Objects Framework, or simply EOF, was introduced by NeXT in 1994 as a pioneering object-relational mapping product for its NeXTSTEP and OpenStep development platforms. EOF abstracts the process of interacting with a relational database by mapping database rows to Java or Objective-C objects. This largely relieves developers from writing low-level SQL code. EOF enjoyed some niche success in the mid-1990s among financial institutions who were attracted to the rapid application development advantages of NeXT's object-oriented platform. Since Apple Inc's merger with NeXT in 1996, EOF has evolved into a fully integrated part of WebObjects, an application server also originally from NeXT. Many of the core concepts of EOF re-emerged as part of Core Data, which further abstracts the underlying data formats to allow it to be based on non-SQL stores. == History == In the early 1990s NeXT Computer recognized that connecting to databases was essential to most businesses and yet also potentially complex. Every data source has a different data-access language (or API), driving up the costs to learn and use each vendor's product. The NeXT engineers wanted to apply the advantages of object-oriented programming, by getting objects to "talk" to relational databases. As the two technologies are very different, the solution was to create an abstraction layer, insulating developers from writing the low-level procedural code (SQL) specific to each data source. The first attempt came in 1992 with the release of Database Kit (DBKit), which wrapped an object-oriented framework around any database. Unfortunately, NEXTSTEP at the time was not powerful enough and DBKit had serious design flaws. NeXT's second attempt came in 1994 with the Enterprise Objects Framework (EOF) version 1, a complete rewrite that was far more modular and OpenStep compatible. EOF 1.0 was the first product released by NeXT using the Foundation Kit and introduced autoreleased objects to the developer community. The development team at the time was only four people: Jack Greenfield, Rich Williamson, Linus Upson and Dan Willhite. EOF 2.0, released in late 1995, further refined the architecture, introducing the editing context. At that point, the development team consisted of Dan Willhite, Craig Federighi, Eric Noyau and Charly Kleissner. EOF achieved a modest level of popularity in the financial programming community in the mid-1990s, but it would come into its own with the emergence of the World Wide Web and the concept of web applications. It was clear that EOF could help companies plug their legacy databases into the Web without any rewriting of that data. With the addition of frameworks to do state management, load balancing and dynamic HTML generation, NeXT was able to launch the first object-oriented Web application server, WebObjects, in 1996, with EOF at its core. In 2000, Apple Inc. (which had merged with NeXT) officially dropped EOF as a standalone product, meaning that developers would be unable to use it to create desktop applications for the forthcoming Mac OS X. It would, however, continue to be an integral part of a major new release of WebObjects. WebObjects 5, released in 2001, was significant for the fact that its frameworks had been ported from their native Objective-C programming language to the Java language. Critics of this change argue that most of the power of EOF was a side effect of its Objective-C roots, and that EOF lost the beauty or simplicity it once had. Third-party tools, such as EOGenerator, help fill the deficiencies introduced by Java (mainly due to the loss of categories). The Objective-C code base was re-introduced with some modifications to desktop application developers as Core Data, part of Apple's Cocoa API, with the release of Mac OS X Tiger in April 2005. == How EOF works == Enterprise Objects provides tools and frameworks for object-relational mapping. The technology specializes in providing mechanisms to retrieve data from various data sources, such as relational databases via JDBC and JNDI directories, and mechanisms to commit data back to those data sources. These mechanisms are designed in a layered, abstract approach that allows developers to think about data retrieval and commitment at a higher level than a specific data source or data source vendor. Central to this mapping is a model file (an "EOModel") that you build with a visual tool — either EOModeler, or the EOModeler plug-in to Xcode. The mapping works as follows: Database tables are mapped to classes. Database columns are mapped to class attributes. Database rows are mapped to objects (or class instances). You can build data models based on existing data sources or you can build data models from scratch, which you then use to create data structures (tables, columns, joins) in a data source. The result is that database records can be transposed into Java objects. The advantage of using data models is that applications are isolated from the idiosyncrasies of the data sources they access. This separation of an application's business logic from database logic allows developers to change the database an application accesses without needing to change the application. EOF provides a level of database transparency not seen in other tools and allows the same model to be used to access different vendor databases and even allows relationships across different vendor databases without changing source code. Its power comes from exposing the underlying data sources as managed graphs of persistent objects. In simple terms, this means that it organizes the application's model layer into a set of defined in-memory data objects. It then tracks changes to these objects and can reverse those changes on demand, such as when a user performs an undo command. Then, when it is time to save changes to the application's data, it archives the objects to the underlying data sources. === Using Inheritance === In designing Enterprise Objects developers can leverage the object-oriented feature known as inheritance. A Customer object and an Employee object, for example, might both inherit certain characteristics from a more generic Person object, such as name, address, and phone number. While this kind of thinking is inherent in object-oriented design, relational databases have no explicit support for inheritance. However, using Enterprise Objects, you can build data models that reflect object hierarchies. That is, you can design database tables to support inheritance by also designing enterprise objects that map to multiple tables or particular views of a database table. == Enterprise Objects (EOs) == An Enterprise Object is analogous to what is often known in object-oriented programming as a business object — a class which models a physical or conceptual object in the business domain (e.g. a customer, an order, an item, etc.). What makes an EO different from other objects is that its instance data maps to a data store. Typically, an enterprise object contains key-value pairs that represent a row in a relational database. The key is basically the column name, and the value is what was in that row in the database. So it can be said that an EO's properties persist beyond the life of any particular running application. More precisely, an Enterprise Object is an instance of a class that implements the com.webobjects.eocontrol.EOEnterpriseObject interface. An Enterprise Object has a corresponding model (called an EOModel) that defines the mapping between the class's object model and the database schema. However, an enterprise object doesn't explicitly know about its model. This level of abstraction means that database vendors can be switched without it affecting the developer's code. This gives Enterprise Objects a high degree of reusability. == EOF and Core Data == Despite their common origins, the two technologies diverged, with each technology retaining a subset of the features of the original Objective-C code base, while adding some new features. === Features Supported Only by EOF === EOF supports custom SQL; shared editing contexts; nested editing contexts; and pre-fetching and batch faulting of relationships, all features of the original Objective-C implementation not supported by Core Data. Core Data also does not provide the equivalent of an EOModelGroup—the NSManagedObjectModel class provides methods for merging models from existing models, and for retrieving merged models from bundles. === Features Supported Only by Core Data === Core Data supports fetched properties; multiple configurations within a managed object model; local stores; and store aggregation (the data for a given entity may be spread across multiple stores); customization and localization of property names and validation warnings; and the use of predicates for property validation. These features of the original Objective-C implementation are not supported by the Java implementation.

    Read more →
  • World Congress of Universal Documentation

    World Congress of Universal Documentation

    The World Congress of Universal Documentation was held from 16 to 21 August 1937 in Paris, France. Delegates from 45 countries met to discuss means by which all of the world's information, in print, in manuscript, and in other forms, could be efficiently organized and made accessible. == The Congress in the history of information science == The Congress, held at the Trocadéro under "the auspices" of the Institut International de Bibliographie, was "the apotheosis" of a general movement in the 1930s towards the classification of the growing mass of information and the improvement of access to that information. For the first time in the history of information science, technological means were beginning to catch up with theoretical ends, and the discussions at the conference reflected that fact. Its participation in the Congress was one of the first projects of the American Documentation Institute (ADI). Participants in the conference discussed what has been more recently called "a continuously updated hypertext encyclopedia." Joseph Reagle sees many of the ideas considered at the conference as forerunners of some of the key goals and norms of Wikipedia. == Microfilm == The main resolution adopted by the congress proposed that microfilm be used to make information universally available. Watson Davis, chairman of the American delegation and president of the ADI, stated that the volume of information being produced created difficult problems of access and preservation, but that these could be solved by the use of microfilm. In his address to the Congress, Davis said: Most immediate and practical to put into operation is the microfilming of material in libraries upon demand. It will become fashionable and economical to send a potential book borrower a little strip of microfilm for his permanent possession instead of the book and then badgering him to return it before he has had a chance to use it effectively. I believe that reading machines for microfilm will become as common as typewriters in studies and laboratories. If the principal libraries and information centers of the world will cooperate in such "bibliofilm services," as they are called, if they exchange orders and have essentially uniform methods, forms for ordering, standard microfilm format and production methods and comparable if not uniform prices, the resources of any library will be placed at the disposal of any scholar or scientist anywhere in the world. All the libraries cooperating will merge into one world library without loss of identity or individuality. The world's documentation will become available to even the most isolated and individualistic scholar. The Congress included two separate exhibits on microfilm. One was of the equipment used at the Bibliothèque nationale de France and the other, coordinated by Herman H. Fussler of the University of Chicago, consisting of "an entire microfilm laboratory," complete with cameras, a darkroom, and various kinds of reading machines. Emanuel Goldberg presented a paper on an early copying camera he had invented. Other resolutions passed by the Congress concerned uniform standards for the preparation of articles, for classifying books and other documents, for indexing newspapers and periodicals, and for cooperation between libraries. == H. G. Wells == In his address to the Congress, H. G. Wells said that he thought that his idea of the "world brain" was a precursor to the ideas other delegates were proposing, and explicitly linked the projects being discussed to the work of the encyclopédistes: I am speaking of a process of mental organization throughout the world which I believe to be as inevitable as anything can be in human affairs. All the distresses and horrors of the present time are fundamentally intellectual. The world has to pull its mind together, and this [Congress] is the beginning of its efforts. Civilization is a Phoenix. It perishes in flames and even as it dies it is born again. This synthesis of knowledge upon which you are working is the necessary beginning of a new world. It is good to be meeting here in Paris where the first encyclopedia of power was made. It would be impossible to overrate our debt to Diderot and his associates. == Other participants == Participants in the Congress included authors, librarians, scholars, archivists, scientists, and editors. Some of the notable people in attendance not mentioned above were:

    Read more →