AI Headshot Generator Free Reddit

AI Headshot Generator Free Reddit — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Software component

    Software component

    A software component is a modular unit of software that encapsulates specific functionality. The desired characteristics of a component are reusability and maintainability. == Value == Components allow software developers to assemble software with reliable parts rather than writing code for every aspect. It makes implementation more like factory assembly than custom building. == Attributes == Desirable attributes of a component include but are not limited to: Cohesive – encapsulates related functionality Reusable Robust Substitutable – can be replaced by another component with the same interface Documented Tested == Third-party == Some components are built in-house by the same organization or team building the software system. Some are third-party, developed elsewhere and assembled into the software system. == Component-based software engineering == For large-scale systems, component-based development encourages a disciplined process to manage complexity. == Framework == Some components conform to a framework technology that allows them to be consumed in a well-known way. Examples include: CORBA, COM, Enterprise JavaBeans, and the .NET Framework. == Modeling == Component design is often modeled visually. In Unified Modeling Language (UML) 2.0 a component is shown as a rectangle, and an interface is shown as a lollipop to indicate a provided interface and as a socket to indicate consumption of an interface. == History == The idea of reusable software components was promoted by Douglas McIlroy in his presentation at the NATO Software Engineering Conference of 1968. (One goal of that conference was to resolve the so-called software crisis of the time.) In the 1970s, McIlroy put this idea into practice with the addition of the pipeline feature to the Unix operating system. Brad Cox refined the concept of a software component in the 1980s. He attempted to create an infrastructure and market for reusable third-party components by inventing the Objective-C programming language. IBM introduced System Object Model (SOM) in the early 1990s. Microsoft introduced Component Object Model (COM) in the early 1990s. Microsoft built many domain-specific component technologies on COM, including Distributed Component Object Model (DCOM), Object Linking and Embedding (OLE), and ActiveX.

    Read more →
  • Emotion recognition

    Emotion recognition

    Emotion recognition is the process of identifying human emotion. People vary widely in their accuracy at recognizing the emotions of others. Use of technology to help people with emotion recognition is a relatively nascent research area. Generally, the technology works best if it uses multiple modalities in context. To date, the most work has been conducted on automating the recognition of facial expressions from video, spoken expressions from audio, written expressions from text, and physiology as measured by wearables. == Human == Humans show a great deal of variability in their abilities to recognize emotion. A key point to keep in mind when learning about automated emotion recognition is that there are several sources of "ground truth", or truth about what the real emotion is. Suppose we are trying to recognize the emotions of Alex. One source is "what would most people say that Alex is feeling?" In this case, the 'truth' may not correspond to what Alex feels, but may correspond to what most people would say it looks like Alex feels. For example, Alex may actually feel sad, but he puts on a big smile and then most people say he looks happy. If an automated method achieves the same results as a group of observers it may be considered accurate, even if it does not actually measure what Alex truly feels. Another source of 'truth' is to ask Alex what he truly feels. This works if Alex has a good sense of his internal state, and wants to tell you what it is, and is capable of putting it accurately into words or a number. However, some people are alexithymic and do not have a good sense of their internal feelings, or they are not able to communicate them accurately with words and numbers. In general, getting to the truth of what emotion is actually present can take some work, can vary depending on the criteria that are selected, and will usually involve maintaining some level of uncertainty. == Automatic == Decades of scientific research have been conducted developing and evaluating methods for automated emotion recognition. There is now an extensive literature proposing and evaluating hundreds of different kinds of methods, leveraging techniques from multiple areas, such as signal processing, machine learning, computer vision, and speech processing. Different methodologies and techniques may be employed to interpret emotion such as Bayesian networks. , Gaussian Mixture models and Hidden Markov Models and deep neural networks. === Approaches === The accuracy of emotion recognition is usually improved when it combines the analysis of human expressions from multimodal forms such as texts, physiology, audio, or video. Different emotion types are detected through the integration of information from facial expressions, body movement and gestures, and speech. The technology is said to contribute in the emergence of the so-called emotional or emotive Internet. The existing approaches in emotion recognition to classify certain emotion types can be generally classified into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches. ==== Knowledge-based techniques ==== Knowledge-based techniques (sometimes referred to as lexicon-based techniques), utilize domain knowledge and the semantic and syntactic characteristics of text and potentially spoken language in order to detect certain emotion types. In this approach, it is common to use knowledge-based resources during the emotion classification process such as WordNet, SenticNet, ConceptNet, and EmotiNet, to name a few. One of the advantages of this approach is the accessibility and economy brought about by the large availability of such knowledge-based resources. A limitation of this technique on the other hand, is its inability to handle concept nuances and complex linguistic rules. Knowledge-based techniques can be mainly classified into two categories: dictionary-based and corpus-based approaches. Dictionary-based approaches find opinion or emotion seed words in a dictionary and search for their synonyms and antonyms to expand the initial list of opinions or emotions. Corpus-based approaches on the other hand, start with a seed list of opinion or emotion words, and expand the database by finding other words with context-specific characteristics in a large corpus. While corpus-based approaches take into account context, their performance still vary in different domains since a word in one domain can have a different orientation in another domain. ==== Statistical methods ==== Statistical methods commonly involve the use of different supervised machine learning algorithms in which a large set of annotated data is fed into the algorithms for the system to learn and predict the appropriate emotion types. Machine learning algorithms generally provide more reasonable classification accuracy compared to other approaches, but one of the challenges in achieving good results in the classification process, is the need to have a sufficiently large training set. Some of the most commonly used machine learning algorithms include Support Vector Machines (SVM), Naive Bayes, and Maximum Entropy. Deep learning, which is under the unsupervised family of machine learning, is also widely employed in emotion recognition. Well-known deep learning algorithms include different architectures of Artificial Neural Network (ANN) such as Convolutional Neural Network (CNN), Long Short-term Memory (LSTM), and Extreme Learning Machine (ELM). The popularity of deep learning approaches in the domain of emotion recognition may be mainly attributed to its success in related applications such as in computer vision, speech recognition, and Natural Language Processing (NLP). ==== Hybrid approaches ==== Hybrid approaches in emotion recognition are essentially a combination of knowledge-based techniques and statistical methods, which exploit complementary characteristics from both techniques. Some of the works that have applied an ensemble of knowledge-driven linguistic elements and statistical methods include sentic computing and iFeel, both of which have adopted the concept-level knowledge-based resource SenticNet. The role of such knowledge-based resources in the implementation of hybrid approaches is highly important in the emotion classification process. Since hybrid techniques gain from the benefits offered by both knowledge-based and statistical approaches, they tend to have better classification performance as opposed to employing knowledge-based or statistical methods independently. A downside of using hybrid techniques however, is the computational complexity during the classification process. === Datasets === Data is an integral part of the existing approaches in emotion recognition and in most cases it is a challenge to obtain annotated data that is necessary to train machine learning algorithms. For the task of classifying different emotion types from multimodal sources in the form of texts, audio, videos or physiological signals, the following datasets are available: HUMAINE: provides natural clips with emotion words and context labels in multiple modalities Belfast database: provides clips with a wide range of emotions from TV programs and interview recordings SEMAINE: provides audiovisual recordings between a person and a virtual agent and contains emotion annotations such as angry, happy, fear, disgust, sadness, contempt, and amusement IEMOCAP: provides recordings of dyadic sessions between actors and contains emotion annotations such as happiness, anger, sadness, frustration, and neutral state eNTERFACE: provides audiovisual recordings of subjects from seven nationalities and contains emotion annotations such as happiness, anger, sadness, surprise, disgust, and fear DEAP: provides electroencephalography (EEG), electrocardiography (ECG), and face video recordings, as well as emotion annotations in terms of valence, arousal, and dominance of people watching film clips DREAMER: provides electroencephalography (EEG) and electrocardiography (ECG) recordings, as well as emotion annotations in terms of valence, dominance of people watching film clips MELD: is a multiparty conversational dataset where each utterance is labeled with emotion and sentiment. MELD provides conversations in video format and hence suitable for multimodal emotion recognition and sentiment analysis. MELD is useful for multimodal sentiment analysis and emotion recognition, dialogue systems and emotion recognition in conversations. MuSe: provides audiovisual recordings of natural interactions between a person and an object. It has discrete and continuous emotion annotations in terms of valence, arousal and trustworthiness as well as speech topics useful for multimodal sentiment analysis and emotion recognition. UIT-VSMEC: is a standard Vietnamese Social Media Emotion Corpus (UIT-VSMEC) with about 6,927 human-annotated sentences with six emotion labels, contributing to emotion recognition research in Vietnamese

    Read more →
  • Enterprise data planning

    Enterprise data planning

    Enterprise data planning is the starting point for enterprise wide change. It states the destination and describes how you will get there. It defines benefits, costs and potential risks. It provides measures to be used along the way to judge progress and adjust the journey according to changing circumstances. Data is fundamental to investment enterprises. Effective, economic management of data underpins operations and enables transformations needed to satisfy customer demands, competition and regulation. Data warehouse(s) and other aspects of the overall data architecture are critical to the enterprise. EDMworks has created a strategic data planning approach for the Investment Sector. It consists of a planning process, planning intranets, templates and training materials. EDMworks planning process is based on the belief that extensive domain knowledge significantly shortens planning iterations and enables progressively higher quality plans to be produced and implemented. This approach drives the development of an effective and economic enterprise data architecture. Enterprise data planning is based on proven business disciplines. Key architectural layers for data and applications are then added in order to provide an enterprise wide understanding of the uses and interdependencies of data. This enables the definition of the core components of the EDM plan: Industry structure and business objectives Assessment of systems and services Target architecture for applications, data and infrastructure Target organization structures Systems, database, infrastructure and organizational plans Business case, costs, benefits, results and risks. EDMworks uses several components from the Open Systems Group TOGAF enterprise systems planning process. TOGAF acts as an extension to good business planning methods to provide a framework for the development of the systems and data architectural components. == History == James Martin was one of the pathfinders in data planning methodologies. He was one of the first to identify data as being an enterprise wide asset that required management. He developed a series of tools and methods to support that process. Most of the large consulting firms developed their own methods to address the same basic issue. Frequently, their approaches were incorporated into their own branded system development methodologies that encompassed the complete systems development life-cycle. Others, such as Ed Tozer, developed more focused offerings that dealt with the complexities of extracting key business needs from senior management and then defining relevant architectural visions for the specific enterprise. From these various sources, the concepts of Business, Data, Applications and Technology Architectures emerged. The Open Group Architectural Framework (TOGAF) has taken this work forward and has established a sound method in TOGAF version 9. EDMworks approach is to adopt these planning and architectural practices as a basis and then add two additional dimensions to the planning and implementation focus: Domain knowledge of the Investments sector. Investments is a complex global industry with a common set of characteristics about clients, information vendors, competition and regulation. Domain knowledge significantly improves the quality of the planning and implementation processes Development of people and teams. Change is a major feature of in any Enterprise Data Management program and people and teams both need development in order to make EDM effective throughout an organization.

    Read more →
  • The Citation Project

    The Citation Project

    The Citation Project is a series of studies that measure and analyze first-year college writing students' source use and their ability to understand and implement sources within their own writing. The Citation Project reveals students' source-use habits and the issues that can be seen based on their lack of proper citation skills, such as the prevalence of plagiarism, institution policies, and the results of current writing pedagogy. The Citation Project's central findings were first presented at the Conference on College Composition and Communication in 2012. Although The Citation Project originally referred to this single 2012 study, the feedback received led to the conception of the Project as a broader initiative and as a place to gather and publish studies and data relating to student writing habits for the usage of other researches. == Method == The Citation Project's data comes from the work of 20 researchers analyzing 174 first-year composition students' research papers. The student papers studied originated from 16 institutions across the United States of America, including community colleges, public and private universities, denominational colleges, and Ivy Leagues. Researchers used bibliographic coding to aggregate data regarding the type, length, reading level, and usage of students' sources. == Findings == === Student source assessment and use === This study found that students were capable of identifying, locating, and accessing librarian-approved academic sources, most commonly accessing them with the internet. Despite students demonstrating their ability to find appropriate sources, they tend to exclusively cite the first few pages of their sources. Students' use and analysis of their citations are often limited, frequently resorting to patchwriting, directly restating their source's points, and omitting their own interpretations of their reference's ideas. The Citation Project also highlights students' struggle to accurately determine, address, and value their sources' bias, authority, and credibility. According to the Project's researchers' analysis, these habits demonstrate that first-year college writing students minimally engage with their sources and the academic conversations between them. One researcher from the Citation Project, Rebecca Moore Howard, believes these findings do not point towards students being lazy, but is rather a result of a writing pedagogy that prioritizes efficient, product-focused writing. Another interpretation offered by Sandra Jamieson, another researcher from the Citation Project explains their findings as a result of a lack of adherence to Information Learning (IL) Standards. === Pedagogy === A significant focus of The Citation Project is the development of pedagogical practices intended to equip students with writing and research techniques that will set them up for future success. Writers associated with The Citation Project, such as Tricia Serviss, believe that the practices of teachers surrounding academic integrity and writing practices are what form the foundation of how students think about writing and how to engage with assignments throughout their academic career. They also stress the importance of teaching students to effectively engage with sources rather than simply how to correctly cite them. The Citation Project asserts that endowing students with the ability to read, understand, and synthesize a variety of sources in their writing is a skill that will benefit them throughout their academic careers, and that the surface level typographical focus that many writing programs utilize is inadequate. == Plagiarism == One of the areas that The Citation Project also looks at is how students commit plagiarism throughout their writing. Plagiarism tends to be a checkpoint that gives instructors a sense where students' citation skills stand. Findings from The Citation Project reveal that the most common type of plagiarism is patchwriting which is the act of using the same sentence with only changing a couple of words. These types of issues can be seen as a learning curve due to lack of proper training. Student's that commit plagiarism are often unaware. === Policies === Another issue found is that academic plagiarism policies may not benefit a student's growth but may instead obstruct it. Policies against plagiarism tend to be harsh on the student that committed of offense. Even though student plagiarism is often unintentional academic institutions see this behavior as intentional. Student may then face harsh consequences as a result from their lack of citation knowledge. Additionally, higher level institutions assume that new students already have the skill set to avoid plagiarism which may be an unrealistic expectation. == Legacy == === Inspired studies === ==== Parrott and Napier ==== In one study, "Critical Reading and Student Self-Selected Texts: Results of a Collaborative, Explicit Curricular Approach," Jill Parrot and Trenia Napier quoted the Citation Project's findings as evidence that current collegiate writing curriculums are an ineffective means of teaching students how to properly write academic research papers. The researchers accredited current writing pedagogy's lack of emphasis on teaching critical reading skills. Parrott and Napier tested their thesis by seeing if students would produce more academic writing if they partook in a writing course that taught critical reading. Their results mostly went against this hypothesis, finding students who received additional critical reading training only significantly improved in how they integrated their sources. ==== Kocatepe ==== In May Mehtap Kocatep's study, "Reconceptualising the notion of finding information: How undergraduate students construct information as they read-to-write in an academic writing class," Kocatep expresses that she believes current conversations around writing pedagogy, including the Citation Project, operate with the underlying misconception that information is an easily discoverable static entity and its retrieval is an objective, unbiased decision. Kocatepe instead offers the analysis of what students view as valuable information and if it is worth using is influenced by the socially constructed meanings available to writers at the moment. To further examine students' source engagement, Kocatepe did a study on how female university students from the United Arab Emirates find, retrieve, use, and value sources. Kocatepe's results mainly noted students' almost exclusive reliance on using Google to find sources, as well as how students' navigated mainly English-speaking academic conversations as non-native English speakers. ==== Dahlen, Nordstrom-Sanchez, and Graff ==== Dahlen, Nordstrom-Sanchez, and Graff built their study off The Citation Project research in order to explore the attitudes and practices of students in an undergraduate writing course. As the researchers acknowledge, data collected by the Citation Project was the subject of the bulk of their analysis. This study sought to examine undergraduate writing practices tied to source-usage and elucidate any relevant trends. Dahlen, Nordstrom-Sanchez and Graff found that undergraduate writing students were not engaging with outside sources properly. Key issues discussed include lack of engagement with broad source ideas (in favor of picking out quotes), lack of paraphrasing, and inability to link information between multiple sources. ==== Davis ==== Phillip M. Davis based much of the analysis in his study on data gathered by the Citation Project. This study aimed to examine the particular effects web-based research and study had on undergraduate's papers and the replicability of their bibliographies. Davis sought to see how the shift from physical in-person library based research to online, often at-home research changed the function and usability of the bibliography as a form of documenting source usage in a given work. The primary method of analysis involved examining students' bibliographies to see where they were finding information online and how these sources were accessed. A main issue Davis found was "persistency" of URLs used for online citations. He found that only 18% of URL-based citations continued to function (the others either no longer pointing to the correct document or ceasing to exist altogether) within 3 years of their usage by students, and more than half of claimed online citations could not be found in any form. He suggests that this result brings up questions about how web-based citations should be dealt with in a university context.

    Read more →
  • LumenVox

    LumenVox

    LumenVox is a privately held speech recognition software company based in San Diego, California. LumenVox has been described as one of the market leaders in the speech recognition software industry. == History == LumenVox was founded in 2001 as subsidiary of Progressive Computing. According to LumenVox CEO Edward Miller, when Progressive had initially looked to add speech recognition to its own phone system, it found the existing offerings too expensive and recognized a niche in the market for a more affordable speech recognition product. This led to the development of LumenVox with an aim to bring speech recognition to small-to-midsized businesses. LumenVox is one of the major providers of automatic speech recognition for telephone systems, and as of 2006, became the second largest provider of speech recognition software. == Products == The primary LumenVox product is the LumenVox Speech Engine. It is a speaker-independent automatic speech recognizer that uses the Speech Recognition Grammar Specification for building and defining grammars. It has been integrated with several of the major voice platforms, including Avaya Voice Portal/Interactive Response, Aculab, and BroadSoft's BroadWorks. The Speech Engine was originally derived from CMU Sphinx, but LumenVox has added considerable development effort to make it a commercial-ready product. LumenVox also offers a product called the Speech Tuner, which provides a graphical means of testing and troubleshooting speech recognition applications. == Open source support == LumenVox was recognized as one of the top VoIP companies in 2008 for its work in providing its offerings to the open source community, an effort by the company that began in 2006 when it partnered with Digium. At that time, Digium, maintainer of the open source Asterisk PBX, integrated the LumenVox Speech Engine into Asterisk. This made LumenVox the first commercially available speech recognition engine for Asterisk. As one of the earlier commercial software integrations with Asterisk, the LumenVox integration has been described as one of the applications that helped to mainstream Asterisk. In 2009, LumenVox also began offering access to the Speech Engine as a monthly subscription, bringing the cost of entry down even lower for open source users. LumenVox is also integrated with the open source UniMRCP project, which provides open source client and server libraries for the Media Resource Control Protocol.

    Read more →
  • Applications of artificial intelligence

    Applications of artificial intelligence

    Artificial intelligence is the capability of computational systems to perform tasks that are typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. Artificial intelligence has been used in applications throughout industry and academia. Within the field of Artificial Intelligence, there are multiple subfields. The subfield of machine learning has been used for various scientific and commercial purposes, including language translation, image recognition, decision-making, credit scoring, and e-commerce. In recent years, massive advancements have been made in the field of generative artificial intelligence, which uses generative models to generate text, images, videos, and other forms of data. This article describes applications of AI in different sectors. == Agriculture == In agriculture, AI has been proposed as a way for farmers to identify areas that need irrigation, fertilization, or pesticide treatments to increase yields, thereby improving efficiency. AI has been used to attempt to classify livestock pig call emotions, automate greenhouses, detect diseases and pests, and optimize irrigation. == AI-assisted software develoment == == Architecture and design == == Business == A 2023 study found that generative AI increased productivity by 15% in contact centers. Another 2023 study found it increased productivity by up to 40% in writing tasks. An August 2025 review by MIT found that of surveyed companies, 95% did not report any improvement in revenue from the use of AI. A September 2025 article by the Harvard Business Review describes how increased use of AI does not automatically lead to increases in revenue or actual productivity. Referring to "AI generated work content that masquerades as good work, but lacks the substance to meaningfully advance a given task" the article coins the term workslop. Per studies done in collaboration with the Stanford Social Media Lab, workslop does not improve productivity and undermines trust and collaboration among colleagues. In telehealth, agentic AI is reportedly facilitating the creation of large business models (millions in annual profit) with 1-2 employees, such as MEDVi, which as of August 2025 only had 2 employees and ~$75M in annual profit for GLP-1 weight-loss telehealth services. == Chatbots == == Computer science == === Programming assistance === ==== AI-assisted software development ==== AI can be used for real-time code completion, chat, and automated test generation. These tools are typically integrated with editors and IDEs as plugins. AI-assisted software development systems differ in functionality, quality, speed, and approach to privacy. Creating software primarily via AI is known as "vibe coding". Code created or suggested by AI can be incorrect or inefficient. The use of AI-assisted coding can potentially speed-up software development, but can also slow-down the process by creating more work when debugging and testing. The rush to prematurely adopt AI technology can also incur additional technical debt. AI also requires additional consideration and careful review for cybersecurity, since AI coding software is trained on a wide range of code of inconsistent quality and often replicates poor practices. ==== Neural network design ==== AI can be used to create other AIs. For example, around November 2017, Google's AutoML project to evolve new neural net topologies created NASNet, a system optimized for ImageNet and POCO F1. NASNet's performance exceeded all previously published performance on ImageNet. ==== Quantum computing ==== Research and development of quantum computers has been performed with machine learning algorithms. For example, there is a prototype, photonic, quantum memristive device for neuromorphic computers (NC)/artificial neural networks and NC-using quantum materials with some variety of potential neuromorphic computing-related applications. The use of quantum machine learning for quantum simulators has been proposed for solving physics and chemistry problems. === Historical contributions === AI researchers have created many tools to solve the most difficult problems in computer science. Many of their inventions have been adopted by mainstream computer science and are no longer considered AI. All of the following were originally developed in AI laboratories: Time sharing Interactive interpreters Graphical user interfaces and the computer mouse Rapid application development environments The linked list data structure Automatic storage management Symbolic programming Functional programming Dynamic programming Object-oriented programming Optical character recognition Constraint satisfaction == Customer service == === Human resources === AI programs have been used in hiring processes to screen resumes and rank candidates based on their qualifications, predict a candidate's likelihood of success in a given role, and automate repetitive communication tasks using chatbots. Studies on these programs have identified tendencies for gender bias, favoring male names and male-coded characteristics, as well as bias against disabled candidates and racial minorities. === Online and telephone customer service === AI underlies avatars (automated online assistants) on web pages. It can reduce operation and training costs. Pypestream automated customer service for its mobile application to streamline communication with customers. A Google app analyzes language and converts speech into text. The platform can identify angry customers through their language and respond appropriately. Amazon uses a chatbot for customer service that can perform tasks like checking the status of an order, cancelling orders, offering refunds and connecting the customer with a human representative. Generative AI (GenAI), such as ChatGPT, is increasingly used in business to automate tasks and enhance decision-making. === Hospitality === In the hospitality industry, AI is used to reduce repetitive tasks, analyze trends, interact with guests, and predict customer needs. AI hotel services come in the form of a chatbot, application, virtual voice assistant and service robots. == Education == In educational institutions, AI has been used to automate routine tasks such as attendance tracking, grading, and marking. AI tools have also been used to monitor student progress and analyze learning behaviors, with the goal of facilitating timely interventions for students facing academic challenges. == Energy and environment == === Energy system === The U.S. Department of Energy wrote in an April 2024 report that AI may have applications in modeling power grids, reviewing federal permits with large language models, predicting levels of renewable energy production, and improving the planning process for electrical vehicle charging networks. Other studies have suggested that machine learning can be used for energy consumption prediction and scheduling, e.g. to help with renewable energy intermittency management (see also: smart grid and climate change mitigation in the power grid). === Environmental monitoring === Autonomous ships that monitor the ocean, AI-driven satellite data analysis, passive acoustics or remote sensing and other applications of environmental monitoring make use of machine learning. For example, "Global Plastic Watch" is an AI-based satellite monitoring-platform for analysis/tracking of plastic waste sites to help prevention of plastic pollution – primarily ocean pollution – by helping identify who and where mismanages plastic waste, dumping it into oceans. === Early-warning systems === Machine learning can be used to spot early-warning signs of disasters and environmental issues, possibly including natural pandemics, earthquakes, landslides, heavy rainfall, long-term water supply vulnerability, tipping-points of ecosystem collapse, cyanobacterial bloom outbreaks, and droughts. === Economic and social challenges === The University of Southern California launched the Center for Artificial Intelligence in Society, with the goal of using AI to address problems such as homelessness. Stanford researchers use AI to analyze satellite images to identify high poverty areas. == Entertainment and media == === Media === AI applications analyze media content such as movies, TV programs, advertisement videos or user-generated content. The solutions often involve computer vision. Typical scenarios include the analysis of images using object recognition or face recognition techniques, or the analysis of video for scene recognizing scenes, objects or faces. AI-based media analysis can facilitate media search, the creation of descriptive keywords for content, content policy monitoring (such as verifying the suitability of content for a particular TV viewing time), speech to text for archival or other purposes, and the detection of logos, products or celebrity faces for ad placement. Motion interpolation Pixel-art scaling algorithms Image scaling Imag

    Read more →
  • Scriptella

    Scriptella

    Scriptella is an open source extract transform load (ETL) and script execution tool written in Java. It allows the use of SQL or another scripting language suitable for the data source to perform required transformations. Scriptella does not offer any graphical user interface. == Typical use == Database migration. Database creation/update scripts. Cross-database ETL operations, import/export. Alternative for Ant task. Automated database schema upgrade. == Features == Simple XML syntax for scripts. Add dynamics to your existing SQL scripts by creating a thin wrapper XML file: Support for multiple datasources (or multiple connections to a single database) in an ETL file. Support for many useful JDBC features, e.g. parameters in SQL including file blobs and JDBC escaping. Performance and low memory usage are one of the primary goals. Support for evaluated expressions and properties (JEXL syntax) Support for cross-database ETL scripts by using elements Transactional execution Error handling via elements Conditional scripts/queries execution (similar to Ant if/unless attributes but more powerful) Easy-to-Use as a standalone tool or Ant task, without deployment or installation. Easy-To-Run ETL files directly from Java code. Built-in adapters for popular databases for a tight integration. Support for any database with JDBC/ODBC compliant driver. Service Provider Interface (SPI) for interoperability with non-JDBC DataSources and integration with scripting languages. Out of the box support for JSR 223 (Scripting for the Java Platform) compatible languages. Built-in CSV, TEXT, XML, LDAP, Lucene, Velocity, JEXL and Janino providers. Integration with Java EE, Spring Framework, JMX and JNDI for enterprise ready scripts.

    Read more →
  • Interviewer effect

    Interviewer effect

    The interviewer effect (also called interviewer variance or interviewer error) is the distortion of response to an interviewer-administered data collection effort which results from differential reactions to the social style and personality of interviewers or to their presentation of particular questions. The use of fixed-wording questions is one method of reducing interviewer bias. Anthropological research and case-studies are also affected by the problem, which is exacerbated by the self-fulfilling prophecy, when the researcher is also the interviewer it is also any effect on data gathered from interviewing people that is caused by the behavior or characteristics (real or perceived) of the interviewer. Interviewer effects can also be associated with the characteristics of the interviewer, such as race. Whether black respondents are interviewed by white interviewers or black interviewers has a strong impact on their responses to both attitude questions and behavioral ones. In the latter case, for example, if black respondents are interviewed by black interviewers in pre-election surveys, they are more likely to actually vote in the upcoming election than if they are interviewed by white interviewers. Furthermore, the race of the interviewer can also affect answers to factual questions that might take the form of a test of how informed the respondent is. Black respondents in a survey of political knowledge, for example, get fewer correct answers to factual questions about politics when interviewed by white interviewers than when interviewed by black interviewers. This is consistent with the research literature on stereotype threat, which finds diminished test performance of potentially stigmatised groups when the interviewer or test supervisor is from a perceived higher status group. Interviewer effects can be mitigated somewhat by randomly assigning subjects to different interviewers, or by using tools such as computer-assisted telephone interviewing (CATI).

    Read more →
  • Statistical learning theory

    Statistical learning theory

    Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics. == Introduction == The goals of learning are understanding and prediction. Learning falls into many categories, including supervised learning, unsupervised learning, online learning, and reinforcement learning. From the perspective of statistical learning theory, supervised learning is best understood. Supervised learning involves learning from a training set of data. Every point in the training is an input–output pair, where the input maps to an output. The learning problem consists of inferring the function that maps between the input and the output, such that the learned function can be used to predict the output from future input. Depending on the type of output, supervised learning problems are either problems of regression or problems of classification. If the output takes a continuous range of values, it is a regression problem. Using Ohm's law as an example, a regression could be performed with voltage as input and current as an output. The regression would find the functional relationship between voltage and current to be R {\displaystyle R} , such that V = I R {\displaystyle V=IR} Classification problems are those for which the output will be an element from a discrete set of labels. Classification is very common for machine learning applications. In facial recognition, for instance, a picture of a person's face would be the input, and the output label would be that person's name. The input would be represented by a large multidimensional vector whose elements represent pixels in the picture. After learning a function based on the training set data, that function is validated on a test set of data, data that did not appear in the training set. == Formal description == Take X {\displaystyle X} to be the vector space of all possible inputs, and Y {\displaystyle Y} to be the vector space of all possible outputs. Statistical learning theory takes the perspective that there is some unknown probability distribution over the product space Z = X × Y {\displaystyle Z=X\times Y} , i.e. there exists some unknown p ( z ) = p ( x , y ) {\displaystyle p(z)=p(\mathbf {x} ,y)} . The training set is made up of n {\displaystyle n} samples from this probability distribution, and is notated S = { ( x 1 , y 1 ) , … , ( x n , y n ) } = { z 1 , … , z n } {\displaystyle S=\{(\mathbf {x} _{1},y_{1}),\dots ,(\mathbf {x} _{n},y_{n})\}=\{\mathbf {z} _{1},\dots ,\mathbf {z} _{n}\}} Every x i {\displaystyle \mathbf {x} _{i}} is an input vector from the training data, and y i {\displaystyle y_{i}} is the output that corresponds to it. In this formalism, the inference problem consists of finding a function f : X → Y {\displaystyle f:X\to Y} such that f ( x ) ∼ y {\displaystyle f(\mathbf {x} )\sim y} . Let H {\displaystyle {\mathcal {H}}} be a space of functions f : X → Y {\displaystyle f:X\to Y} called the hypothesis space. The hypothesis space is the space of functions the algorithm will search through. Let V ( f ( x ) , y ) {\displaystyle V(f(\mathbf {x} ),y)} be the loss function, a metric for the difference between the predicted value f ( x ) {\displaystyle f(\mathbf {x} )} and the actual value y {\displaystyle y} . The expected risk is defined to be I [ f ] = ∫ X × Y V ( f ( x ) , y ) p ( x , y ) d x d y {\displaystyle I[f]=\int _{X\times Y}V(f(\mathbf {x} ),y)\,p(\mathbf {x} ,y)\,d\mathbf {x} \,dy} The target function, the best possible function f {\displaystyle f} that can be chosen, is given by the f {\displaystyle f} that satisfies f = argmin h ∈ H ⁡ I [ h ] {\displaystyle f=\mathop {\operatorname {argmin} } _{h\in {\mathcal {H}}}I[h]} Because the probability distribution p ( x , y ) {\displaystyle p(\mathbf {x} ,y)} is unknown, a proxy measure for the expected risk must be used. This measure is based on the training set, a sample from this unknown probability distribution. It is called the empirical risk I S [ f ] = 1 n ∑ i = 1 n V ( f ( x i ) , y i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})} A learning algorithm that chooses the function f S {\displaystyle f_{S}} that minimizes the empirical risk is called empirical risk minimization. == Loss functions == The choice of loss function is a determining factor on the function f S {\displaystyle f_{S}} that will be chosen by the learning algorithm. The loss function also affects the convergence rate for an algorithm. It is important for the loss function to be convex. Different loss functions are used depending on whether the problem is one of regression or one of classification. === Regression === The most common loss function for regression is the square loss function (also known as the L2-norm). This familiar loss function is used in Ordinary Least Squares regression. The form is: V ( f ( x ) , y ) = ( y − f ( x ) ) 2 {\displaystyle V(f(\mathbf {x} ),y)=(y-f(\mathbf {x} ))^{2}} The absolute value loss (also known as the L1-norm) is also sometimes used: V ( f ( x ) , y ) = | y − f ( x ) | {\displaystyle V(f(\mathbf {x} ),y)=|y-f(\mathbf {x} )|} === Classification === In some sense the 0-1 indicator function is the most natural loss function for classification. It takes the value 0 if the predicted output is the same as the actual output, and it takes the value 1 if the predicted output is different from the actual output. For binary classification with Y = { − 1 , 1 } {\displaystyle Y=\{-1,1\}} , this is: V ( f ( x ) , y ) = θ ( − y f ( x ) ) {\displaystyle V(f(\mathbf {x} ),y)=\theta (-yf(\mathbf {x} ))} where θ {\displaystyle \theta } is the Heaviside step function. == Regularization == In machine learning problems, a major problem that arises is that of overfitting. Because learning is a prediction problem, the goal is not to find a function that most closely fits the (previously observed) data, but to find one that will most accurately predict output from future input. Empirical risk minimization runs this risk of overfitting: finding a function that matches the data exactly but does not predict future output well. Overfitting is symptomatic of unstable solutions; a small perturbation in the training set data would cause a large variation in the learned function. It can be shown that if the stability for the solution can be guaranteed, generalization and consistency are guaranteed as well. Regularization can solve the overfitting problem and give the problem stability. Regularization can be accomplished by restricting the hypothesis space H {\displaystyle {\mathcal {H}}} . A common example would be restricting H {\displaystyle {\mathcal {H}}} to linear functions: this can be seen as a reduction to the standard problem of linear regression. H {\displaystyle {\mathcal {H}}} could also be restricted to polynomial of degree p {\displaystyle p} , exponentials, or bounded functions on L1. Restriction of the hypothesis space avoids overfitting because the form of the potential functions are limited, and so does not allow for the choice of a function that gives empirical risk arbitrarily close to zero. One example of regularization is Tikhonov regularization. This consists of minimizing 1 n ∑ i = 1 n V ( f ( x i ) , y i ) + γ ‖ f ‖ H 2 {\displaystyle {\frac {1}{n}}\sum _{i=1}^{n}V(f(\mathbf {x} _{i}),y_{i})+\gamma \left\|f\right\|_{\mathcal {H}}^{2}} where γ {\displaystyle \gamma } is a fixed and positive parameter, the regularization parameter. Tikhonov regularization ensures existence, uniqueness, and stability of the solution. == Bounding empirical risk == Consider a binary classifier f : X → { 0 , 1 } {\displaystyle f:{\mathcal {X}}\to \{0,1\}} . We can apply Hoeffding's inequality to bound the probability that the empirical risk deviates from the true risk to be a Sub-Gaussian distribution. P ( | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 e − 2 n ϵ 2 {\displaystyle \mathbb {P} (|{\hat {R}}(f)-R(f)|\geq \epsilon )\leq 2e^{-2n\epsilon ^{2}}} But generally, when we do empirical risk minimization, we are not given a classifier; we must choose it. Therefore, a more useful result is to bound the probability of the supremum of the difference over the whole class. P ( sup f ∈ F | R ^ ( f ) − R ( f ) | ≥ ϵ ) ≤ 2 S ( F , n ) e − n ϵ 2 / 8 ≈ n d e − n ϵ 2 / 8 {\displaystyle \mathbb {P} {\bigg (}\sup _{f\in {\mathcal {F}}}|{\hat {R}}(f)-R(f)|\geq \epsilon {\bigg )}\leq 2S({\mathcal {F}},n)e^{-n\epsilon ^{2}/8}\approx n^{d}e^{-n\epsilon ^{2}/8}} where S ( F , n ) {\displaystyle S({\mathcal {F}},n)} is the shattering number and n {\displaystyle n} is the number of samples in your dataset. The exponential term comes from Hoeffding but there is an extra cost of taking the supremum over the whole cla

    Read more →
  • Novell Storage Manager

    Novell Storage Manager

    Novell Storage Manager is a system software package released by Novell in 2004 that uses identity, policy and directory events to automate full lifecycle management of file storage for individual users and organizational groups. By tying storage management to an organization's existing identity infrastructure, it has been pointed out, Novell Storage Manager enables the administration of users across all file servers "as a single pool rather than [in] separate independently managed domains." Novell Storage Manager is a component of the Novell File Management Suite. == How It Works == Novell Storage Manager dynamically manages and provisions storage based on user and group events that occur in the directory, including user creations, group assignments, moves, renames, and deletions. When a change happens in the directory that affects a user’s file storage needs or user storage policy, Storage Manager applies the appropriate policy and makes the necessary changes at the file system level to address those storage needs. The following key components comprise Novell Storage Manager's identity and policy-driven state machine architecture: Directory services; Storage policies; Novell Storage Manager event monitors; Novell Storage Manager policy engine; Novell Storage Manager agents; and Action objects. This state machine architecture enables the engine to properly deal with transient waits with directory synchronization issues. It also allows recovery from failures involving network communications, a target server or a server running a component of Storage Manager—including the policy engine itself. If a failure or interruption occurs at any point during operation, Storage Manager will be able to successfully continue the operation from where it was when the interruption occurred. == Reviews == Jon Toigo called Novell Storage Manager "a robust and smart approach to corralling user files... into an organized and efficient management scheme". He also said it was "best in class" of the products he'd reviewed.

    Read more →
  • AVT Statistical filtering algorithm

    AVT Statistical filtering algorithm

    AVT Statistical filtering algorithm is an approach to improving quality of raw data collected from various sources. It is most effective in cases when there is inband noise present. In those cases AVT is better at filtering data then, band-pass filter or any digital filtering based on variation of. Conventional filtering is useful when signal/data has different frequency than noise and signal/data is separated/filtered by frequency discrimination of noise. Frequency discrimination filtering is done using Low Pass, High Pass and Band Pass filtering which refers to relative frequency filtering criteria target for such configuration. Those filters are created using passive and active components and sometimes are implemented using software algorithms based on Fast Fourier transform (FFT). AVT filtering is implemented in software and its inner working is based on statistical analysis of raw data. When signal frequency/(useful data distribution frequency) coincides with noise frequency/(noisy data distribution frequency) we have inband noise. In this situations frequency discrimination filtering does not work since the noise and useful signal are indistinguishable and where AVT excels. To achieve filtering in such conditions there are several methods/algorithms available which are briefly described below. == Averaging algorithm == Collect n samples of data Calculate average value of collected data Present/record result as actual data == Median algorithm == Collect n samples of data Sort the data in ascending or descending order. Note that order does not matter Select the data that happen to be in n/2 position and present/record it as final result representing data sample == AVT algorithm == AVT algorithm stands for Antonyan Vardan Transform and its implementation explained below. Collect n samples of data Calculate the standard deviation and average value Drop any data that is greater or less than average ± one standard deviation Calculate average value of remaining data Present/record result as actual value representing data sample This algorithm is based on amplitude discrimination and can easily reject any noise that is not like actual signal, otherwise statistically different than 1 standard deviation of the signal. Note that this type of filtering can be used in situations where the actual environmental noise is not known in advance. Notice that it is preferable to use the median in above steps than average. Originally the AVT algorithm used average value to compare it with results of median on the data window. == Filtering algorithms comparison == Using a system that has signal value of 1 and has noise added at 0.1% and 1% levels will simplify quantification of algorithm performance. The R script is used to create pseudo random noise added to signal and analyze the results of filtering using several algorithms. Please refer to "Reduce Inband Noise with the AVT Algorithm" article for details. This graphs show that AVT algorithm provides best results compared with Median and Averaging algorithms while using data sample size of 32, 64 and 128 values. Note that this graph was created by analyzing random data array of 10000 values. Sample of this data is graphically represented below. From this graph it is apparent that AVT outperforms other filtering algorithms by providing 5% to 10% more accurate data when analyzing same datasets. Considering random nature of noise used in this numerical experiment that borderlines worst case situation where actual signal level is below ambient noise the precision improvements of processing data with AVT algorithm are significant. == AVT algorithm variations == === Cascaded AVT === In some situations better results can be obtained by cascading several stages of AVT filtering. This will produce singular constant value which can be used for equipment that has known stable characteristics like thermometers, thermistors and other slow acting sensors. === Reverse AVT === Collect n samples of data Calculate the standard deviation and average value Drop any data that is within one standard deviation ± average band Calculate average value of remaining data Present/record result as actual data This is useful for detecting minute signals that are close to background noise level. == Possible applications and uses == Use to filter data that is near or below noise level Used in planet detection to filter out raw data from the Kepler space telescope Filter out noise from sound sources where all other filtering methods (Low-pass filter, High-pass filter, Band-pass filter, Digital filter) fail. Pre-process scientific data for data analysis (Smoothness) before plotting see (Plot (graphics)) Used in SETI (Search for extraterrestrial intelligence) for detecting/distinguishing extraterrestrial signals from cosmic background Use AVT as image filtering algorithm to detect altered images. This image of Jupiter generated from this program, detecting alterations in original picture that was modified to be visually appealing by applying filters. Another version of this comparison is the Reverse AVT filter applied to the same original Jupiter Image, where we only see that altered portion as Noise that was eliminated by AVT algorithm. Use AVT as image filtering algorithm to estimate data density from images. Picture of Pillars of Creation Nebula shows data density in filtered images from Hubble and Webb. Note that image on the left has big patches of missing data marked with simpler color patterns.

    Read more →
  • Data management plan

    Data management plan

    A data management plan or DMP is a formal document that outlines how data are to be handled both during a research project, and after the project is completed. The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; this may lead to data being well-managed in the present, and prepared for preservation in the future. DMPs were originally used in 1966 to manage aeronautical and engineering projects' data collection and analysis, and expanded across engineering and scientific disciplines in the 1970s and 1980s. Up until the early 2000s, DMPs were used "for projects of great technical complexity, and for limited mid-study data collection and processing purposes". In the 2000s and later, E-research and economic policies drove the development and uptake of DMPs. == Importance == Preparing a data management plan before data are collected is claimed to ensure that data are in the correct format, organized well, and better annotated. This could arguably save time in the long term because there is no need to re-organize, re-format, or try to remember details about data. It is also claimed to increase research efficiency since both the data collector and other researchers might be able to understand and use well-annotated data in the future. One component of a data management plan is data archiving and preservation. By deciding on an archive ahead of time, the data collector can format data during collection to make its future submission to a database easier. If data are preserved, they are more relevant since they can be re-used by other researchers. It also allows the data collector to direct requests for data to the database, rather than address requests individually. A frequent argument in favor of preservation is that data that are preserved have the potential to lead to new, unanticipated discoveries, and they prevent duplication of scientific studies that have already been conducted. Data archiving also provides insurance against loss by the data collector. In the 2010s, funding agencies increasingly required data management plans as part of the proposal and evaluation process, despite little or no evidence of their efficacy. == Major components == "There is no general and definitive list of topics that should be covered in a DMP for a research project", and researchers are often left to their own devices as to how to fill out a DMP. === Information about data and data format === A description of data to be produced by the project. This might include (but is not limited to) data that are: Experimental Observational Raw or derived Physical collections Models Simulations Curriculum materials Software Images How will the data be acquired? When and where will they be acquired? After collection, how will the data be processed? Include information about Software used Algorithms Scientific workflows File formats that will be used, justify those formats, and describe the naming conventions used. Quality assurance & quality control measures that will be taken during sample collection, analysis, and processing. If existing data are used, what are their origins? How will the data collected be combined with existing data? What is the relationship between the data collected and existing data? How will the data be managed in the short-term? Consider the following: Version control for files Backing up data and data products Security & protection of data and data products Who will be responsible for management === Metadata content and format === Metadata are the contextual details, including any information important for using data. This may include descriptions of temporal and spatial details, instruments, parameters, units, files, etc. Metadata is commonly referred to as "data about data". Issues to be considered include: How detailed has the metadata to be in order to make the data meaningful? How will the metadata be created and/or captured? Examples include lab notebooks, GPS hand-held units, Auto-saved files on instruments, etc. What format will be used for the metadata? What are the metadata standards commonly used in the respective scientific discipline? There should be justification for the format chosen. === Policies for access, sharing, and re-use === Describe any obligations that exist for sharing data collected. These may include obligations from funding agencies, institutions, other professional organizations, and legal requirements. Include information about how data will be shared, including when the data will be accessible, how long the data will be available, how access can be gained, and any rights that the data collector reserves for using data. Address any ethical or privacy issues with data sharing Address intellectual property & copyright issues. Who owns the copyright? What are the institutional, publisher, and/or funding agency policies associated with intellectual property? Are there embargoes for political, commercial, or patent reasons? Describe the intended future uses/users for the data Indicate how the data should be cited by others. How will the issue of persistent citation be addressed? For example, if the data will be deposited in a public archive, will the dataset have a persistent identifier (e.g., ARK, DOI, Handle, PURL, URN) assigned to it? === Long-term storage and data management === Researchers should identify an appropriate archive for the long-term preservation of their data. By identifying the archive early in the project, the data can be formatted, transformed, and documented appropriately to meet the requirements of the archive. Researchers should consult colleagues and professional societies in their discipline to determine the most appropriate database, and include a backup archive in their data management plan in case their first choice goes out of existence. Early in the project, the primary researcher should identify what data will be preserved in an archive. Usually, preserving the data in its most raw form is desirable, although data derivatives and products can also be preserved. An individual should be identified as the primary contact person for archived data, and ensure contact information is always kept up-to-date in case there are requests for data or information about data. === Budget === Data management and preservation costs may be considerable, depending on the nature of the project. By anticipating costs ahead of time, researchers ensure that the data will be properly managed and archived. Potential expenses that should be considered are Human resources and staff as they handle data preparation, management, documentation, and preservation Hardware and/or software needed for data management, backing up, security, documentation, and preservation Costs associated with submitting the data to an archive The data management plan should include how these costs will be paid. == NSF Data Management Plan == All grant proposals submitted to National Science Foundation (NSF) must include a Data Management Plan that is no more than two pages. This is a supplement (not part of the 15-page proposal) and should describe how the proposal will conform to the Award and Administration Guide policy (see below). It may include the following: The types of data The standards to be used for data and metadata format and content Policies for access and sharing Policies and provisions for re-use Plans for archiving data Policy summarized from the NSF Award and Administration Guide, Section 4 (Dissemination and Sharing of Research Results): Promptly publish with appropriate authorship Share data, samples, physical collections, and supporting materials with others, within a reasonable time frame Share software and inventions Investigators can keep their legal rights over their intellectual property, but they still have to make their results, data, and collections available to others Policies will be implemented via Proposal review Award negotiations and conditions Support/incentives == ESRC Data Management Plan == Since 1995, the UK's Economic and Social Research Council (ESRC) have had a research data policy in place. The current ESRC Research Data Policy states that research data created as a result of ESRC-funded research should be openly available to the scientific community to the maximum extent possible, through long-term preservation and high-quality data management. ESRC requires a data management plan for all research award applications where new data are being created. Such plans are designed to promote a structured approach to data management throughout the data lifecycle, resulting in better quality data that is ready to archive for sharing and re-use. The UK Data Service, the ESRC's flagship data service, provides practical guidance on research data management planning suitable for social science researchers in the UK and around the world. ESRC has a longstanding arrangement with the UK Data A

    Read more →
  • Floyd–Steinberg dithering

    Floyd–Steinberg dithering

    Floyd–Steinberg dithering is an image dithering algorithm first published in 1976 by Robert W. Floyd and Louis Steinberg. It is commonly used by image manipulation software, for example, when converting an image from a Truecolor 24-bit PNG format into a GIF format, which is restricted to a maximum of 256 colors. == Implementation == The algorithm achieves dithering using error diffusion, meaning it pushes (adds) the residual quantization error of a pixel onto its neighboring pixels, to be quantized after. It spreads the debt out according to the distribution (shown as a map of the neighboring pixels): [ ∗ 7 16 … … 3 16 5 16 1 16 … ] {\displaystyle {\begin{bmatrix}&&&{\frac {\displaystyle 7}{\displaystyle 16}}&\ldots \\\ldots &{\frac {\displaystyle 3}{\displaystyle 16}}&{\frac {\displaystyle 5}{\displaystyle 16}}&{\frac {\displaystyle 1}{\displaystyle 16}}&\ldots \\\end{bmatrix}}} The pixel indicated with a star () indicates the pixel currently being scanned, and the blank pixels are the previously scanned pixels. The specific values (7/16, 3/16, 5/16, 1/16) were originally found by trial-and-error, "guided by the desire to have a region of desired density 0.5 come out as a checkerboard pattern". The algorithm scans the image from left to right, top to bottom, quantizing pixel values one by one. Each time, the quantization error is transferred to the neighboring pixels, while not affecting the pixels that already have been quantized. Hence, if a number of pixels have been rounded downwards, it becomes more likely that the next pixel is rounded upwards, such that on average, the quantization error is close to zero. The diffusion coefficients have the property that if the original pixel values are exactly halfway in between the nearest available colors, the dithered result is a checkerboard pattern. For example, 50% grey data could be dithered as a black-and-white checkerboard pattern. For optimal dithering, the counting of quantization errors should be in sufficient accuracy to prevent rounding errors from affecting the result. For correct results, all values should be linearized first, rather than operating directly on sRGB values as is common for images stored on computers. In some implementations, the horizontal direction of scan alternates between lines; this is called "serpentine scanning" or boustrophedon transform dithering. The algorithm described above is in the following pseudocode. This works for any approximately linear encoding of pixel values, such as 8-bit integers, 16-bit integers or real numbers in the range [0, 1]. for each y from top to bottom do for each x from left to right do oldpixel := pixels[x][y] newpixel := find_closest_palette_color(oldpixel) pixels[x][y] := newpixel quant_error := oldpixel - newpixel pixels[x + 1][y ] := pixels[x + 1][y ] + quant_error × 7 / 16 pixels[x - 1][y + 1] := pixels[x - 1][y + 1] + quant_error × 3 / 16 pixels[x ][y + 1] := pixels[x ][y + 1] + quant_error × 5 / 16 pixels[x + 1][y + 1] := pixels[x + 1][y + 1] + quant_error × 1 / 16 When converting grayscale pixel values from a high to a low bit depth (e.g. 8-bit grayscale to 1-bit black-and-white), find_closest_palette_color() may perform just a simple rounding, for example: find_closest_palette_color(oldpixel) = round(oldpixel / 255) The pseudocode can result in pixel values exceeding the valid values (such as greater than 255 in 8-bit grayscale images). Such values should ideally be handled by the find_closest_palette_color() function, rather than clipping the intermediate values, since a subsequent error may bring the value back into range. However, if fixed-width integers are used, wrapping of intermediate values would cause inversion of black and white, and so should be avoided. The find_closest_palette_color() implementation is nontrivial for a palette that is not evenly distributed, however small inaccuracies in selecting the correct palette color have minimal visual impact due to error being propagated to future pixels. A nearest neighbor search in 3D is frequently used.

    Read more →
  • ArchiMate

    ArchiMate

    ArchiMate ( AR-ki-mayt) is an open and independent enterprise architecture modeling language to support the description, analysis and visualization of architecture within and across business domains in an unambiguous way. ArchiMate is a technical standard from The Open Group and is based on concepts from the now superseded IEEE 1471 standard. It is supported by various tool vendors and consulting firms. ArchiMate is also a registered trademark of The Open Group. The Open Group has a certification program for ArchiMate users, software tools and courses. ArchiMate distinguishes itself from other languages such as Unified Modeling Language (UML) and Business Process Modeling and Notation (BPMN) by its enterprise modelling scope. Also, UML and BPMN are meant for a specific use and they are quite heavy – containing about 150 (UML) and 250 (BPMN) modeling concepts whereas ArchiMate works with just about 50 (in version 2.0). The goal of ArchiMate is to be ”as small as possible”, not to cover every edge scenario imaginable. To be easy to learn and apply, ArchiMate was intentionally restricted “to the concepts that suffice for modeling the proverbial 80% of practical cases". == Overview == ArchiMate offers a common language for describing the construction and operation of business processes, organizational structures, information flows, IT systems, and technical infrastructure. This insight helps the different stakeholders to design, assess, and communicate the consequences of decisions and changes within and between these business domains. The main concepts and relationships of the ArchiMate language can be seen as a framework, the so-called Archimate Framework: It divides the enterprise architecture into a business, application and technology layer. In each layer, three aspects are considered: active elements, an internal structure and elements that define use or communicate information. One of the objectives of the ArchiMate language is to define the relationships between concepts in different architecture domains. The concepts of this language therefore hold the middle between the detailed concepts, which are used for modeling individual domains (for example, the Unified Modeling Language (UML) for modeling software products), and Business Process Model and Notation (BPMN), which is used for business process modeling. == History == ArchiMate is partly based on the now superseded IEEE 1471 standard. It was developed in the Netherlands by a project team from the Telematica Instituut in cooperation with several Dutch partners from government, industry and academia. Among the partners were Ordina NV, Radboud Universiteit Nijmegen, the Leiden Institute for Advanced Computer Science (LIACS) and the Centrum Wiskunde & Informatica (CWI). Later, tests were performed in organizations such as ABN AMRO, the Dutch Tax and Customs Administration and the ABP. The development process lasted from July 2002 to December 2004, and took about 35 person years and approximately 4 million euros. The development was funded by the Dutch government (Dutch Tax and Customs Administration), and business partners, including ABN AMRO and the ABP Pension Fund. In 2008 the ownership and stewardship of ArchiMate was transferred to The Open Group. It is now managed by the ArchiMate Forum within The Open Group. In February 2009 The Open Group published the ArchiMate 1.0 standard as a formal technical standard. In January 2012 the ArchiMate 2.0 standard, and in 2013 the ArchiMate 2.1 standard was released. In June 2016, the Open Group released version 3.0 of the ArchiMate Specification. An update to Archimate 3.0.1 came out in August 2017. Archimate 3.1 was published 5 November 2019. The latest version of the ArchiMate Specification is version 3.2 released October 2022. Version 3.0 adds enhanced support for capability-oriented strategic modelling, new entities representing physical resources (for modelling the ingredients, equipment and transport resources used in the physical world) and a generic metamodel showing the entity types and the relationships between them. == ArchiMate framework == === Core framework === The main concepts and elements of the ArchiMate language are being presented as ArchiMate core framework. It consists of three layers and three aspects. This creates a matrix of combinations. Every layer has its passive structure, behavior and active structure aspects. ==== Layers ==== ArchiMate has a layered and service-oriented look on architectural models. The higher layers make use of services that are provided by the lower layers. Although, at an abstract level, the concepts that are used within each layer are similar, we define more concrete concepts that are specific for a certain layer. In this context, we distinguish three main layers: The business layer is about business processes, services, functions and events of business units. This layer "offers products and services to external customers, which are realized in the organization by business processes performed by business actors and roles". The application layer is about software applications that "support the components in the business with application services". The technology layer deals "with the hardware and communication infrastructure to support the application layer. This layer offers infrastructural services needed to run applications, realized by computer and communication hardware and system software". Each of these main layers can be further divided in sub-layers. For example, in the business layer, the primary business processes realising the products of a company may make use of a layer of secondary (supporting) business processes; in the application layer, the end-user applications may make use of generic services offered by supporting applications. On top of the business layer, a separate environment layer may be added, modelling the external customers that make use of the services of the organisation (although these may also be considered part of the business layer). In line with service orientation, the most important relation between layers is formed by use relations, which show how the higher layers make use of the services of lower layers. However, a second type of link is formed by realisation relations: elements in lower layers may realise comparable elements in higher layers; e.g., a ‘data object’ (application layer) may realise a ‘business object’ (business layer); or an ‘artifact’ (technology layer) may realise either a ‘data object’ or an ‘application component’ (application layer). ==== Aspects ==== Passive structure is the set of entities on which actions are conducted. In the business layer the example would be information objects, in the application layer data objects and in the technology layer, they could include physical objects. Behavior refers to the processes and functions performed by the actors. "Structural elements are assigned to behavioral elements, to show who or what displays the behavior". Active structure is the set of entities that display some behavior, e.g. business actors, devices, or application components. === Full framework === The Full ArchiMate framework is enriched by the physical layer, which was added to allow modeling of “physical equipment, materials, and distribution networks” and was not present in the previous version. The implementation and migration layer adds elements that allow architects to model a state of transition, to mark parts of the architecture that are temporary for the purpose, as the name says, of implementation and migration. Strategy layer adds three elements: resource, capability and course of action. These elements help to incorporate strategic dimension to the ArchiMate language by allowing it to depict the usage of resources and capabilities in order to achieve some strategic goals. Finally, there is a motivation aspect that allows different stakeholders to describe the motivation of specific actors or domains, which can be quite important when looking at one thing from several different angles. It adds several elements like stakeholder, value, driver, goal, meaning etc. == ArchiMate language == The ArchiMate language is formed as a top-level and is hierarchical. On the top, there is a model. A model is a collection of concepts. A concept can be either an element or a relationship. An element can be either of behavior type, structure, motivation or a so-called composite element (which means that it does not fit just one aspect of the framework, but two or more). The functionality of all concepts without a dependency on a specific layer is described by the generic metamodel. This layer-independent description of concepts is useful when trying to understand the mechanics of the Archimate language. === Concepts === ==== Elements ==== The generic elements are distributed into the same categories as the layers: Active structure elements Behavior elements Passive structure elements Motivation elements Active structure e

    Read more →
  • Virtual facility

    Virtual facility

    A Virtual Facility (VF) is a highly realistic digital representation of a data center, used to model all relevant aspects of a physical data center with a high degree of precision. The term "virtual" in Virtual Facility refers to its use of virtual reality, rather than the abstraction of computer resources as seen in platform virtualization. The VF mirrors the characteristics of a physical facility over time and allows for detailed analysis and modeling. == VF Model features == A standard VF model includes: Three-dimensional physical facility layout Network connectivity of facility equipment Full inventory of facility equipment, including electronics and electrical systems such as power distribution units (PDUs) and uninterruptible power supplies (UPSs) Full air conditioning system (ACUs) and controls within the room The term Virtual Facility was introduced to address the emerging environmental problems facing modern Mission Critical Facilities (MCFs). This concept combines virtual reality (VR), computer simulation, and expert systems applied to the domain of facilities. The VF type of computer simulation allows for detailed analysis and prototyping of airflow in the data center using computational fluid dynamics (CFD) techniques. This enables the visualization and numerical analysis of airflow and temperatures within the facility, helping to predict real-world outcomes. == VF applications == The VF model can be used to assist with the following: Greenfield design Asset management Troubleshooting existing data centers Making existing data centers more resilient Making existing data centers more energy efficient Cost prediction Staff training Capacity planning Load growth management Many organizations use VF models to virtually assess scenarios before committing resources to physical changes. This allows for better decision-making regarding the addition or modification of equipment, helping to avoid logistical or thermal problems.

    Read more →