AI Chat Exporter Chrome Extension

AI Chat Exporter Chrome Extension — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Text-to-image personalization

    Text-to-image personalization

    Text-to-Image personalization is a task in deep learning for computer graphics that augments pre-trained text-to-image generative models. In this task, a generative model that was trained on large-scale data (usually a foundation model), is adapted such that it can generate images of novel, user-provided concepts. These concepts are typically unseen during training, and may represent specific objects (such as the user's pet) or more abstract categories (new artistic style or object relations). Text-to-Image personalization methods typically bind the novel (personal) concept to new words in the vocabulary of the model. These words can then be used in future prompts to invoke the concept for subject-driven generation, inpainting, style transfer and even to correct biases in the model. To do so, models either optimize word-embeddings, fine-tune the generative model itself, or employ a mixture of both approaches. == Technology == Text-to-Image personalization was first proposed during August 2022 by two concurrent works, Textual Inversion and DreamBooth. In both cases, a user provides a few images (typically 3–5) of a concept, like their own dog, together with a coarse descriptor of the concept class (like the word "dog"). The model then learns to represent the subject through a reconstruction based objective, where prompts referring to the subject are expected to reconstruct images from the training set. In Textual Inversion, the personalized concepts are introduced into the text-to-image model by adding new words to the vocabulary of the model. Typical text-to-image models represent words (and sometimes parts-of-words) as tokens, or indices in a predefined dictionary. During generation, an input prompt is converted into such tokens, each of which is converted into a ‘word-embedding’: a continuous vector representation which is learned for each token as part of the model's training. Textual Inversion proposes to optimize a new word-embedding vector for representing the novel concept. This new embedding vector can then be assigned to a user-chosen string, and invoked whenever the user's prompt contains this string. In DreamBooth, rather than optimizing a new word vector, the full generative model itself is fine-tuned. The user first selects an existing token, typically one which rarely appears in prompts. The subject itself is then represented by a string containing this token, followed by a coarse descriptor of the subject's class. A prompt describing the subject will then take the form: "A photo of " (e.g. "a photo of sks cat" when learning to represent a specific cat). The text-to-image model is then tuned so that prompts of this form will generate images of the subject. == Textual Inversion == The key idea in Textual Inversion is to add a new term to the vocabulary of the diffusion model that corresponds to the new (personalized) concept. Textual Inversion operates by inverting the concepts into new pseudo-words within the textual embedding space of a pre-trained text-to-image model. These pseudo-words can be injected into new scenes using simple natural language descriptions, allowing for simple and intuitive modifications. The method allows a user to leverage multi-modal information — using a text-driven interface for ease of editing, but providing visual cues when approaching the limits of natural language. The resulting model is extremely light-weight per concept: only 1K long, but succeeds to encode detailed visual properties of the concept. == Extensions == Several approaches were proposed to refine and improve over the original methods. These include the following. Low-rank Adaptation (LoRA) - an adapter-based technique for efficient finetuning of models. In the case of text-to-image models, LoRA is typically used to modify the cross-attention layers of a diffusion model. Perfusion - a low rank update method that also locks the activations of the key matrix in the diffusion model's cross attention layers to the concept's coarse class. Extended Textual Inversion - a technique that learns an individual word embedding for each layer in the diffusion model's denoising network. Encoder-based methods that use another neural network to quickly personalize a model == Challenges and limitations == Text-to-image personalization methods must contend with several challenges. At their core is the goal of achieving high-fidelity to the personal concept while maintaining high alignment between novel prompts containing the subject, and the generated images (typically referred to as ‘editability’). Another challenge that personalization methods must contend with is memory requirements. Initial implementations of personalization methods required more than 20 Gigabytes of GPU memory, and more recent approaches have reported requirements of more than 40 Gigabytes. However, optimizations such as Flash Attention have since reduced this requirement considerably. Approaches that tune the entire generative model may also create checkpoints that are several gigabytes in size, making it difficult to share or store many models. Embedding based approaches require only a few kilobytes, but typically struggle to preserve identity while maintaining editability. More recent approaches have proposed hybrid tuning goals which optimize both an embedding and a subset of network weights. These can reduce storage requirements to as little as 100 Kilobytes while achieving quality comparable to full tuning methods. Finally, optimization processes can be lengthy, requiring several minutes of tuning for each novel concept. Encoder and quick-tuning methods aim to reduce this to seconds or less.

    Read more →
  • Influencer

    Influencer

    An influencer is an individual who has the capacity to shape the attitudes, behavior, or decisions of others through authority, knowledge, position, or the nature of the relationship with the audience. The term is used in various fields such as media, business, politics, religion, and communication, referring to influencers such as social media influencers, podcasters, public speakers, religious influencers, writers, and newsletter writers etc who have dedicated followings in various areas. One writer defines influencers as "a range of third parties who exercise influence over the organization and its potential customers." Another writer defines an influencer as a "third party who significantly shapes the customer's purchasing decision but may never be accountable for it." According to another writer, influencers are "well-connected, create an impact, have active minds, and are trendsetters". Just because a person has many followers does not necessarily mean they have much influence over those people. In contemporary usage, the term frequently refers to a social media influencer, (also known as an online influencer or simply influencer) a person who builds a grassroots online presence through engaging content such as photos, videos, and updates. This is done by using direct audience interaction to establish authenticity, expertise, and appeal, and by standing apart from traditional celebrities by growing their platform through social media rather than pre-existing fame. The modern referent of the term is commonly a paid role in which a business entity pays for the social media influence-for-hire activity to promote its products and services, known as influencer marketing. A 1% increase in spending on influencer marketing can lead to a 0.5% increase in audience engagement. As such, an influencer effectively acts as a modern salesperson or a marketer. Types of influencers include fashion influencer, travel influencer, and virtual influencer, and they involve content creators and streamers. Some influencers are associated primarily with specific social media apps such as TikTok, Instagram, or Pinterest; many influencers are also considered internet celebrities. As of 2023, Instagram is the social media platform businesses spend the most advertising money towards marketing with influencers. However, influencers can have an impact on any social media network. == History == === Origins === The word influencer in its general sense of a person or thing that exerts influence, is attested in historical sources at least since the 17th century. The Oxford English Dictionary (OED) gives 1664 as the earliest example of usage and cites a sentence from Henry More's A Modest Enquiry into the Mystery of Iniquity: "The head and influencer of the whole Church". The origins of online influencing can be traced back to the emergence of digital blogs and platforms in the early 2000s. Nevertheless, recent studies demonstrate that Instagram, an application with more than one billion users, harbors the majority of the influencer demographic. These individuals are sometimes referred to as "Instagrammers" or "Instafamous". A crucial aspect of influencing is their association with sponsors. The 2015 debut of Vamp, a company that links influencers with sponsorships, transformed the landscape of influencing. There is much debate about whether social media influencers can be considered celebrities, as their path to fame is often less traditional and arguably easier. Melody Nouri addressed the differences between the two types in her article "The Power of Influence: Traditional Celebrities vs Social Media Influencer". Nouri asserts that social media platforms have a greater negative impact on young, impressionable audiences in comparison with traditional media such as magazines, billboards, advertisements, and tabloids featuring celebrities. Online, it is thought to be simpler to manipulate an image and lifestyle in such a way that viewers are more susceptible to believing it. One theory considers the former American First Lady Eleanor Roosevelt (1884–1962) to be the "original media influencer." While she achieved celebrity in her role as First Lady, she built a global personal brand as a wise, informative, trustworthy American woman. Her voice was her own, unrestricted by political advisors and powerful men, and with it, Roosevelt exerted unprecedented social and cultural influence in radio, print, public speaking, film, and television until she died. In one notable example, it may have been Roosevelt's television support of John F. Kennedy which nudged his "hairline victory" during the 1960 Presidential campaign. In another example, David Ogilvy paid Roosevelt more than a quarter of a million dollars in today's currency to make a TV commercial for Good Luck margarine (1959), in which Roosevelt also managed to mention world hunger. As a content creator, she wrote My Day, a popular daily newspaper column that ran nationwide for twenty-six years. Like a social media post, My Day covered all aspects of her life, and in it Roosevelt often recommended movies, books, and products that she admired. Roosevelt also had a hand in designing all three of her public affairs television shows. Unlike contemporary influencers, she was less motivated by a pay-to-play situation than by a desire to educate and inspire; but she did use her influence to benefit the entertainment industry careers of her children, and she welcomed the revenue that her influence bought, most of which was donated to charity. === 2000s === The early 2000s showed corporate endeavors to leverage the internet for influence, with some companies participating in forums for promotions or providing bloggers with complimentary products in return for favorable reviews. A few of these practices were viewed as unethical for taking advantage of the labor of young individuals without providing remuneration. In 2004, The Blogstar Network was established by Ted Murphy of MindComet. Bloggers were encouraged to join an email list and receive remunerated offers from corporations in exchange for creating specific posts. For instance, bloggers were compensated for writing reviews of fast-food meals on their blogs. Blogstar is widely regarded as the first influencer marketing network. Murphy succeeded Blogstar with PayPerPost, which was introduced in 2006. This platform compensated significant posters on prominent forums and social media platforms for every post made about a corporate product. Payment rates were determined by the influencer's status. Though very popular, PayPerPost, received a great deal of criticism as these influencers were not required to disclose their involvement with PayPerPost as traditional journalism would have. With the success of PayPerPost, the public became aware that there was a drive for corporate interests to influence what some people were posting to these sites. The platform also incentivized other firms to establish comparable programs. Despite concerns, marketing networks with influencers continued to grow throughout the 2000s and into the 2010s. The influencer marketing industry was worth as much as $8 billion in 2019, according to estimates from Business Insider Intelligence, which are based on Mediakix data. Evan Asano, the Former CEO and founder of the agency Mediakix, previously spoke with Business Insider and said he believed influencer marketing on Instagram would continue to grow despite likes being hidden. === 2010s === By the 2010s, the term "influencer" described digital content creators with a large following, distinctive brand persona, and a patterned relationship with commercial sponsors. By this period, influencer marketing had become a widely researched field globally, with systematic reviews drawing on hundreds of studies that documented the growing role of authenticity, audience engagement, and parasocial relationships in shaping how consumers responded to influencer content across different markets. During this period, influencer culture also developed through distinct channels outside Western markets. In South Korea, the global spread of Korean pop culture, also called K-Pop, through platforms such as YouTube, Facebook, and Twitter gave rise to what scholars have called 'Hallyu 2.0' or the 'New Korean Wave', where fans throughout Southeast Asia, North America, Latin America, and Europe shared, subtitled, and redistributed Korean music and film content on a large scale. This helped Korean entertainers to build substantial followings internationally. Consumers often mistakenly view celebrities as reliable, leading to trust and confidence in the products being promoted. A 2001 study from Rutgers University discovered that individuals were using "internet forums as influential sources of consumer information." The study proposes that consumers preferred internet forums and social media when making purchasing decisions over conventional advertising and print sources. An in

    Read more →
  • Batch cryptography

    Batch cryptography

    Batch cryptography is a field of cryptology focused on the design of cryptographic protocols that perform operations—such as encryption, decryption, key exchange, and authentication—on multiple inputs simultaneously, rather than processing each input individually. Batching cryptographic operations can significantly reduce the marginal cost of handling individual inputs—a principle that was first introduced by Amos Fiat in 1989.

    Read more →
  • Interplanetary Internet

    Interplanetary Internet

    The interplanetary Internet is a conceived computer network in space, consisting of a set of network nodes that can communicate with each other. These nodes are the planet's orbiters and landers, and the Earth ground stations. For example, the orbiters collect the scientific data from the Curiosity rover on Mars through near-Mars communication links, transmit the data to Earth through direct links from the Mars orbiters to the Earth ground stations via the NASA Deep Space Network, and finally the data routed through Earth's internal internet. Interplanetary communication is greatly delayed by interplanetary distances, as data transmission can only go as fast as the speed of light, so a new set of protocols and technologies that are tolerant to large delays and errors are required. The interplanetary Internet has been envisioned as a store and forward network of internets that is often disconnected, has a wireless backbone fraught with error-prone links and delays ranging from tens of minutes to even hours, even when there is a connection. As of 2024 agencies and companies working towards bringing the network to fruition include NASA, ESA, SpaceX and Blue Origin. == Challenges and reasons == In the core implementation of Interplanetary Internet, satellites orbiting a planet communicate to other planet's satellites. Simultaneously, these planets revolve around the Sun with long distances, and thus many challenges face the communications. The reasons and the resultant challenges are: The motion and long distances between planets: The interplanetary communication is greatly delayed due to the interplanetary distances and the motion of the planets. The delay is variable and long, ranging from a couple of minutes (Earth-to-Mars), to a couple of hours (Pluto-to-Earth), depending on their relative positions. The interplanetary communication also suspends due to the solar conjunction, when the sun's radiation hinders the direct communication between the planets. As such, the communication characterizes lossy links and intermittent link connectivity. Low embeddable payload: Satellites can only carry a small payload, which poses challenges to the power, mass, size, and cost for communication hardware design. An asymmetric bandwidth would be the result of this limitation. This asymmetry reaches ratios up to 1000:1 as downlink:uplink bandwidth portion. Absence of fixed infrastructure: The graph of participating nodes in a specific planet-to-planet communication keeps changing over time, due to the constant motion. The routes of the planet-to-planet communication are planned and scheduled rather than being opportunistic. The Interplanetary Internet design must address these challenges to operate successfully and achieve good communication with other planets. It also must use the few available resources efficiently in the system. == Development == Space communication technology has steadily evolved from expensive, one-of-a-kind point-to-point architectures, to the re-use of technology on successive missions, to the development of standard protocols agreed upon by space agencies of many countries. This last phase has gone on since 1982 through the efforts of the Consultative Committee for Space Data Systems (CCSDS), a body composed of the major space agencies of the world. It has 11 member agencies, 32 observer agencies, and over 119 industrial associates. The evolution of space data system standards has gone on in parallel with the evolution of the Internet, with conceptual cross-pollination where fruitful, but largely as a separate evolution. Since the late 1990s, familiar Internet protocols and CCSDS space link protocols have integrated and converged in several ways; for example, the successful FTP file transfer to Earth-orbiting STRV 1B on January 2, 1996, which ran FTP over the CCSDS IPv4-like Space Communications Protocol Specifications (SCPS) protocols. Internet Protocol use without CCSDS has taken place on spacecraft, e.g., demonstrations on the UoSAT-12 satellite, and operationally on the Disaster Monitoring Constellation. Having reached the era where networking and IP on board spacecraft have been shown to be feasible and reliable, a forward-looking study of the bigger picture was the next phase. The Interplanetary Internet study at NASA's Jet Propulsion Laboratory (JPL) was started by a team of scientists at JPL led by internet pioneer Vinton Cerf and the late Adrian Hooke. Cerf was appointed as a distinguished visiting scientist at JPL in 1998, while Hooke was one of the founders and directors of CCSDS. While IP-like SCPS protocols are feasible for short hops, such as ground station to orbiter, rover to lander, lander to orbiter, probe to flyby, and so on, delay-tolerant networking is needed to get information from one region of the Solar System to another. It becomes apparent that the concept of a region is a natural architectural factoring of the Interplanetary Internet. A region is an area where the characteristics of communication are the same. Region characteristics include communications, security, the maintenance of resources, perhaps ownership, and other factors. The Interplanetary Internet is a "network of regional internets". What is needed then, is a standard way to achieve end-to-end communication through multiple regions in a disconnected, variable-delay environment using a generalized suite of protocols. Examples of regions might include the terrestrial Internet as a region, a region on the surface of the Moon or Mars, or a ground-to-orbit region. The recognition of this requirement led to the concept of a "bundle" as a high-level way to address the generalized Store-and-Forward problem. Bundles are an area of new protocol development in the upper layers of the OSI model, above the Transport Layer with the goal of addressing the issue of bundling store-and-forward information so that it can reliably traverse radically dissimilar environments constituting a "network of regional internets". Delay-tolerant networking (DTN) was designed to enable standardized communications over long distances and through time delays. At its core is the Bundle Protocol (BP), which is similar to the Internet Protocol, or IP, that serves as the heart of the Internet here on Earth. The big difference between the regular Internet Protocol (IP) and the Bundle Protocol is that IP assumes a seamless end-to-end data path, while BP is built to account for errors and disconnections — glitches that commonly plague deep-space communications. Bundle Service Layering, implemented as the Bundling protocol suite for delay-tolerant networking, will provide general-purpose delay-tolerant protocol services in support of a range of applications: custody transfer, segmentation and reassembly, end-to-end reliability, end-to-end security, and end-to-end routing among them. The Bundle Protocol was first tested in space on the UK-DMC satellite in 2008. An example of one of these end-to-end applications flown on a space mission is the CCSDS File Delivery Protocol (CFDP), used on the Deep Impact comet mission. CFDP is an international standard for automatic, reliable file transfer in both directions. CFDP should not be confused with Coherent File Distribution Protocol, which has the same acronym and is an IETF-documented experimental protocol for rapidly deploying files to multiple targets in a highly networked environment. In addition to reliably copying a file from one entity (such as a spacecraft or ground station) to another entity, CFDP has the capability to reliably transmit arbitrarily small messages defined by the user, in the metadata accompanying the file, and to reliably transmit commands relating to file system management that are to be executed automatically on the remote end-point entity (such as a spacecraft) upon successful reception of a file. == Protocol == The Consultative Committee for Space Data Systems (CCSDS) packet telemetry standard defines the protocol used for the transmission of spacecraft instrument data over the deep-space channel. Under this standard, an image or other data sent from a spacecraft instrument is transmitted using one or more packets. === CCSDS packet definition === A packet is a block of data with length that can vary between successive packets, ranging from 7 to 65,542 bytes, including the packet header. Packetized data is transmitted via frames, which are fixed-length data blocks. The size of a frame, including frame header and control information, can range up to 2048 bytes. Packet sizes are fixed during the development phase. Because packet lengths are variable but frame lengths are fixed, packet boundaries usually do not coincide with frame boundaries. === Telecom processing notes === Data in a frame is typically protected from channel errors by error-correcting codes. Even when the channel errors exceed the correction capability of the error-correcting code, the presence of errors is nearly always detected by the e

    Read more →
  • Computer appliance

    Computer appliance

    A computer appliance is a computer system with a combination of hardware, software, or firmware that is specifically designed to provide a particular computing resource. Such devices became known as appliances because of the similarity in role or management to a home appliance, which are generally closed and sealed, and are not serviceable by the user or owner. The hardware and software are delivered as an integrated product and may even be pre-configured before delivery to a customer, to provide a turn-key solution for a particular application. Unlike general purpose computers, appliances are generally not designed to allow the customers to change the software and the underlying operating system, or to flexibly reconfigure the hardware. Another form of appliance is the virtual appliance, which has similar functionality to a dedicated hardware appliance, but is distributed as a software virtual machine image for a hypervisor-equipped device. == Overview == Traditionally, software applications run on top of a general-purpose operating system, which uses the hardware resources of the computer (primarily memory, disk storage, processing power, and networking bandwidth) to meet the computing needs of the user. The main issue with the traditional model is related to complexity. It is complex to integrate the operating system and applications with a hardware platform, and complex to support it afterwards. By tightly constraining the variations of the hardware and software, the appliance becomes easily deployable, and can be used without nearly as wide (or deep) IT knowledge. Additionally, when problems and errors appear, the supporting staff very rarely needs to explore them deeply to understand the matter thoroughly. The staff needs merely training on the appliance management software to be able to resolve most of problems. In all forms of the computer appliance model, customers benefit from easy operations. The appliance has exactly one combination of hardware and operating system and application software, which has been pre-installed at the factory. This prevents customers from needing to perform complex integration work, and dramatically simplifies troubleshooting. In fact, this "turnkey operation" characteristic is the driving benefit that customers seek when purchasing appliances. To be considered an appliance, the (hardware) device needs to be integrated with software, and both are supplied as a package. This distinguishes appliances from "home grown" solutions, or solutions requiring complex implementations by integrators or value-added resellers (VARs). The appliance approach helps to decouple the various systems and applications, for example in the data center. Once a resource is decoupled, in theory it can be also centralized to become shared among many systems, centrally managed and optimized, all without requiring changes to any other system. == Tradeoffs of the computer appliance approach == The major disadvantage of deploying a computer appliance is that since they are designed to supply a specific resource, they most often include a customized operating system running over specialized hardware, neither of which are likely to be compatible with the other systems previously deployed. Customers lose flexibility. One may believe that a proprietary embedded operating system, or operating system within an application, can make the appliance much more secure from common cyber attacks. However, the opposite is true. Security by obscurity is a poor security decision, and appliances are often plagued by security issues as evidenced by the proliferation of IoT devices. == Types of appliances == The variety of computer appliances reflects the wide range of computing resources they provide to applications. Some examples: Storage appliances provide large amounts of storage, often available to many machines on the network. See Network-attached storage and Storage area network. Network appliances are general purpose routers which may also provide firewall protection, Transport Layer Security (TLS), messaging, access to specialized networking protocols (like the ebXML Message Service) and bandwidth multiplexing for the multiple systems they front-end. Backup and disaster recovery appliances computer appliances that are integrated backup software and backup targets, sometimes with hypervisors to support local DR of protected servers. They are often a gateway to a full DRaaS solution. Firewall and Security appliances Dedicated network appliances that are designed to protect computer networks from unwanted traffic. IIoT and MES Gateway appliances Computer appliances that are designed to translate data bidirectionally between control systems and enterprise systems. Proprietary, embedded, firmware applications running on the appliance use point-to-point connections to translate data between field devices in their native automation protocols and MES systems through their APIs, ODBC, or RESTful interfaces. Anti-spam appliances for e-mail spam Software appliances A single application server appliance, with just enough operating system (JeOS) for it to run. Virtual machine appliances consist of a "hypervisor style" embedded operating system running on appliance hardware. The hypervisor layer is matched to the hardware of the appliance, and cannot be varied by the customer, but the customer may load other operating systems and applications onto the appliance in the form of virtual machines. == Consumer appliances == Aside from its deployment within data centers, many computer appliances are directly used by the general public. These include: Digital video recorder Residential gateway Network-attached storage (NAS) Video game console Consumer uses stress the need for an appliance to have easy installation, configuration, and operation, with little or no technical knowledge being necessary. == Appliances in industrial automation == The world of industrial automation has been rich in appliances. These appliances have been hardened to withstand temperature and vibration extremes. These appliances are also highly configurable, enabling customization to meet a wide variety of applications. The key benefits of an appliance in automation are: Reduced downtime - a failed appliance is typically replaced with a COTS replacement and its task is quickly and easily reloaded from a backup. Highly scalable - appliances are typically targeted solutions for an area of a plant or process. As the requirements change, scalability is achieved through the installation of another appliance. Automation concepts are easily replicated throughout the enterprise by standardizing on appliances to perform the needed tasks, as opposed to the development of custom automation schemes for each task. Low TCO (total cost of ownership) - appliances are developed, tested and supported by automation product vendors and undergo a much broader level of quality testing than custom designed automation solutions. The use of appliances in automation reduce the level of testing needed in each individual application. Reduced design time - appliances perform specific functions and although they are highly configurable, they are typically self documenting. This enables appliance based solutions to be transferred from engineer to engineer with minimal need for training and documentation. Types of automation appliances: PLC (programmable logic controller) - Programmable logic controllers are appliances that are typically used for discrete control and offer a wide range of Input and Output options. They are configured through standardized programming languages such as IEC-1131. PID (proportional–integral–derivative controller) - PID controllers are appliances that monitor a process variable and, based on an error term, effect change on a control output (manipulated variable) to drive the process variable to a setpoint. PAC (programmable automation controller) - Programmable automation controllers are appliances that embody properties of both PLCs and PID controllers enabling the integration of both analog and discrete control. Universal gateway - A universal gateway appliance has the ability to communicate with a variety of devices through their respective communication protocols, and will affect data transactions between them. This in increasingly important as manufacturing strives to improve agility, quality, production rates, production costs and reduce downtime through enhanced M2M (machine to machine) communications. EATMs (Enterprise Appliance Transaction Modules) - Enterprise appliance transaction modules are appliances that affect data transactions from plant floor automation systems to enterprise business systems. They communicate to plant floor equipment through various vendor automation protocols, and communicate to business systems through database communication protocols such as JMS (Java Message Service) and SQL (Structured Query Language). == Internal structure == There are several

    Read more →
  • Data Reference Model

    Data Reference Model

    The Data Reference Model (DRM) is one of the five reference models of the Federal Enterprise Architecture. == Overview == The DRM is a framework whose primary purpose is to enable information sharing and reuse across the United States federal government via the standard description and discovery of common data and the promotion of uniform data management practices. The DRM describes artifacts which can be generated from the data architectures of federal government agencies. The DRM provides a flexible and standards-based approach to accomplish its purpose. The scope of the DRM is broad, as it may be applied within a single agency, within a community of interest, or cross-community of interest. == Data Reference Model topics == === DRM structure === The DRM provides a standard means by which data may be described, categorized, and shared. These are reflected within each of the DRM's three standardization areas: Data Description: Provides a means to uniformly describe data, thereby supporting its discovery and sharing. Data Context: Facilitates discovery of data through an approach to the categorization of data according to taxonomies. Additionally, enables the definition of authoritative data assets within a community of interest. Data Sharing: Supports the access and exchange of data where access consists of ad hoc requests (such as a query of a data asset), and exchange consists of fixed, re-occurring transactions between parties. Enabled by capabilities provided by both the Data Context and Data Description standardization areas. === DRM Version 2 === The Data Reference Model version 2 released in November 2005 is a 114-page document with detailed architectural diagrams and an extensive glossary of terms. The DRM also make many references to ISO standards specifically the ISO/IEC 11179 metadata registry standard. === DRM usage === The DRM is not technically a published technical interoperability standard such as web services, it is an excellent starting point for data architects within federal and state agencies. Any federal or state agencies that are involved with exchanging information with other agencies or that are involved in data warehousing efforts should use this document as a guide.

    Read more →
  • Bitmap index

    Bitmap index

    A bitmap index is a special kind of database index that uses bitmaps. Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values, either absolutely, or relative to the number of records that contain the data. The extreme case of low cardinality is Boolean data (e.g., does a resident in a city have internet access?), which has two values, True and False. Bitmap indexes use bit arrays (commonly called bitmaps) and answer queries by performing bitwise logical operations on these bitmaps. Bitmap indexes have a significant space and performance advantage over other structures for query of such data. Their drawback is they are less efficient than the traditional B-tree indexes for columns whose data is frequently updated: consequently, they are more often employed in read-only systems that are specialized for fast query - e.g., data warehouses, and generally unsuitable for online transaction processing applications. Some researchers argue that bitmap indexes are also useful for moderate or even high-cardinality data (e.g., unique-valued data) which is accessed in a read-only manner, and queries access multiple bitmap-indexed columns using the AND, OR or XOR operators extensively. Bitmap indexes are also useful in data warehousing applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema. == Example == Continuing the internet access example, a bitmap index may be logically viewed as follows: On the left, Identifier refers to the unique number assigned to each resident, HasInternet is the data to be indexed, the content of the bitmap index is shown as two columns under the heading bitmaps. Each column in the left illustration under the Bitmaps header is a bitmap in the bitmap index. In this case, there are two such bitmaps, one for "has internet" Yes and one for "has internet" No. It is easy to see that each bit in bitmap Y shows whether a particular row refers to a person who has internet access. This is the simplest form of bitmap index. Most columns will have more distinct values. For example, the sales amount is likely to have a much larger number of distinct values. Variations on the bitmap index can effectively index this data as well. We briefly review three such variations. Note: Many of the references cited here are reviewed at (John Wu (2007)). For those who might be interested in experimenting with some of the ideas mentioned here, many of them are implemented in open source software such as FastBit, the Lemur Bitmap Index C++ Library, the Roaring Bitmap Java library and the Apache Hive Data Warehouse system. == Compression == For historical reasons, bitmap compression and inverted list compression were developed as separate lines of research, and only later were recognized as solving essentially the same problem. Software can compress each bitmap in a bitmap index to save space. There has been considerable amount of work on this subject. Though there are exceptions such as Roaring bitmaps, Bitmap compression algorithms typically employ run-length encoding, such as the Byte-aligned Bitmap Code, the Word-Aligned Hybrid code, the Partitioned Word-Aligned Hybrid (PWAH) compression, the Position List Word Aligned Hybrid, the Compressed Adaptive Index (COMPAX), Enhanced Word-Aligned Hybrid (EWAH) and the COmpressed 'N' Composable Integer SEt (CONCISE). These compression methods require very little effort to compress and decompress. More importantly, bitmaps compressed with BBC, WAH, COMPAX, PLWAH, EWAH and CONCISE can directly participate in bitwise operations without decompression. This gives them considerable advantages over generic compression techniques such as LZ77. BBC compression and its derivatives are used in a commercial database management system. BBC is effective in both reducing index sizes and maintaining query performance. BBC encodes the bitmaps in bytes, while WAH encodes in words, better matching current CPUs. "On both synthetic data and real application data, the new word aligned schemes use only 50% more space, but perform logical operations on compressed data 12 times faster than BBC." PLWAH bitmaps were reported to take 50% of the storage space consumed by WAH bitmaps and offer up to 20% faster performance on logical operations. Similar considerations can be done for CONCISE and Enhanced Word-Aligned Hybrid. The performance of schemes such as BBC, WAH, PLWAH, EWAH, COMPAX and CONCISE is dependent on the order of the rows. A simple lexicographical sort can divide the index size by 9 and make indexes several times faster. The larger the table, the more important it is to sort the rows. Reshuffling techniques have also been proposed to achieve the same results of sorting when indexing streaming data. == Encoding == Basic bitmap indexes use one bitmap for each distinct value. It is possible to reduce the number of bitmaps used by using a different encoding method. For example, it is possible to encode C distinct values using log(C) bitmaps with binary encoding. This reduces the number of bitmaps, further saving space, but to answer any query, most of the bitmaps have to be accessed. This makes it potentially not as effective as scanning a vertical projection of the base data, also known as a materialized view or projection index. Finding the optimal encoding method that balances (arbitrary) query performance, index size and index maintenance remains a challenge. Without considering compression, Chan and Ioannidis analyzed a class of multi-component encoding methods and came to the conclusion that two-component encoding sits at the kink of the performance vs. index size curve and therefore represents the best trade-off between index size and query performance. == Binning == For high-cardinality columns, it is useful to bin the values, where each bin covers multiple values and build the bitmaps to represent the values in each bin. This approach reduces the number of bitmaps used regardless of encoding method. However, binned indexes can only answer some queries without examining the base data. For example, if a bin covers the range from 0.1 to 0.2, then when the user asks for all values less than 0.15, all rows that fall in the bin are possible hits and have to be checked to verify whether they are actually less than 0.15. The process of checking the base data is known as the candidate check. In most cases, the time used by the candidate check is significantly longer than the time needed to work with the bitmap index. Therefore, binned indexes exhibit irregular performance. They can be very fast for some queries, but much slower if the query does not exactly match a bin. == History == The concept of bitmap index was first introduced by Professor Israel Spiegler and Rafi Maayan in their research "Storage and Retrieval Considerations of Binary Data Bases", published in 1985. The first commercial database product to implement a bitmap index was Computer Corporation of America's Model 204. Patrick O'Neil published a paper about this implementation in 1987. This implementation is a hybrid between the basic bitmap index (without compression) and the list of Row Identifiers (RID-list). Overall, the index is organized as a B+tree. When the column cardinality is low, each leaf node of the B-tree would contain long list of RIDs. In this case, it requires less space to represent the RID-lists as bitmaps. Since each bitmap represents one distinct value, this is the basic bitmap index. As the column cardinality increases, each bitmap becomes sparse and it may take more disk space to store the bitmaps than to store the same content as RID-lists. In this case, it switches to use the RID-lists, which makes it a B+tree index. == In-memory bitmaps == One of the strongest reasons for using bitmap indexes is that the intermediate results produced from them are also bitmaps and can be efficiently reused in further operations to answer more complex queries. Many programming languages support this as a bit array data structure. For example, Java has the BitSet class and .NET have the BitArray class. Some database systems that do not offer persistent bitmap indexes use bitmaps internally to speed up query processing. For example, PostgreSQL versions 8.1 and later implement a "bitmap index scan" optimization to speed up arbitrarily complex logical operations between available indexes on a single table. For tables with many columns, the total number of distinct indexes to satisfy all possible queries (with equality filtering conditions on either of the fields) grows very fast, being defined by this formula: C n [ n 2 ] ≡ n ! ( n − [ n 2 ] ) ! [ n 2 ] ! {\displaystyle \mathbf {C} _{n}^{\left[{\frac {n}{2}}\right]}\equiv {\frac {n!}{\left(n-\left[{\frac {n}{2}}\right]\right)!\left[{\frac {n}{2}}\right]!}}} . A bitmap index scan combines expressions on different indexes, thus requiring only one index per column t

    Read more →
  • Comparison of OLAP servers

    Comparison of OLAP servers

    The following tables compare general and technical information for a number of online analytical processing (OLAP) servers. Please see the individual products articles for further information. == General information == == Data storage modes == == APIs and query languages == APIs and query languages OLAP servers support. == OLAP distinctive features == A list of OLAP features that are not supported by all vendors. All vendors support features such as parent-child, multilevel hierarchy, drilldown. == System limits == == Security == == Operating systems == The OLAP servers can run on the following operating systems: Note (1):The server availability depends on Java Virtual Machine not on the operating system == Support information ==

    Read more →
  • Soterml

    Soterml

    SoTerML (Soil and Terrain Markup Language) is a XML-based markup language for storing and exchanging soil and terrain related data. SoTerML development is being done within The e-SoTer Platform. GEOSS plans a global Earth Observation System and, within this framework, the e-SOTER project addresses the felt need for a global soil and terrain database. The Centre for Geospatial Science (Currently Nottingham Geospatial Institute) at the University of Nottingham has initiated the development since January 2009. Further development and maintenance is currently handled in National Soil Resources Institute (NSRI) at Cranfield University, UK. The role of CGS is within the development of the e-SOTER dissemination platform, which is based on INSPIRE principles. The SoTerML development included: 1. Development of a data dictionary for nomenclatures and various data sources (data and metadata). 2. Development of an exchange format/procedures from the World Reference Base 2006.

    Read more →
  • Application delivery network

    Application delivery network

    An application delivery network (ADN) is a suite of technologies that, when deployed together, provide availability, security, visibility, and acceleration for Internet applications such as websites. ADN components provide supporting functionality that enables website content to be delivered to visitors and other users of that website, in a fast, secure, and reliable way. Gartner defines application delivery networking as the combination of WAN optimization controllers (WOCs) and application delivery controllers (ADCs). At the data center end of an ADN is the ADC, an advanced traffic management device that is often also referred to as a web switch, content switch, or multilayer switch, the purpose of which is to distribute traffic among a number of servers or geographically dislocated sites based on application specific criteria. In the branch office portion of an ADN is the WAN optimization controller, which works to reduce the number of bits that flow over the network using caching and compression, and shapes TCP traffic using prioritization and other optimization techniques. Some WOC components are installed on PCs or mobile clients, and there is typically a portion of the WOC installed in the data center. Application delivery networks are also offered by some CDN vendors. The ADC, one component of an ADN, evolved from layer 4-7 switches in the late 1990s when it became apparent that traditional load balancing techniques were not robust enough to handle the increasingly complex mix of application traffic being delivered over a wider variety of network connectivity options. == Application delivery techniques == The Internet was designed according to the end-to-end principle. This principle keeps the core network relatively simple and moves the intelligence as much as possible to the network end-points: the hosts and clients. An Application Delivery Network (ADN) enhances the delivery of applications across the Internet by employing a number of optimization techniques. Many of these techniques are based on established best-practices employed to efficiently route traffic at the network layer including redundancy and load balancing In theory, an Application Delivery Network (ADN) is closely related to a content delivery network. The difference between the two delivery networks lies in the intelligence of the ADN to understand and optimize applications, usually referred to as application fluency. Application Fluent Network (AFN) is based on the concept of Application Fluency to refer to WAN optimization techniques applied at Layer Four to Layer Seven of the OSI model for networks. Application Fluency implies that the network is fluent or intelligent in understanding and being able to optimize delivery of each application. Application Fluent Network is an addition of SDN capabilities. The acronym 'AFN' is used by Alcatel-Lucent Enterprise to refer to an Application Fluent Network. Application delivery uses one or more layer 4–7 switches, also known as a web switch, content switch, or multilayer switch to intelligently distribute traffic to a pool, also known as a cluster or farm, of servers. The application delivery controller (ADC) is assigned a single virtual IP address (VIP) that represents the pool of servers. Traffic arriving at the ADC is then directed to one of the servers in the pool (cluster, farm) based on a number of factors including application specific data values, application transport protocol, availability of servers, current performance metrics, and client-specific parameters. An ADN provides the advantages of load distribution, increase in capacity of servers, improved scalability, security, and increased reliability through application specific health checks. Increasingly the ADN comprises a redundant pair of ADC on which is integrated a number of different feature sets designed to provide security, availability, reliability, and acceleration functions. In some cases these devices are still separate entities, deployed together as a network of devices through which application traffic is delivered, each providing specific functionality that enhances the delivery of the application. == ADN optimization techniques == === TCP multiplexing === TCP Multiplexing is loosely based on established connection pooling techniques utilized by application server platforms to optimize the execution of database queries from within applications. An ADC establishes a number of connections to the servers in its pool and keeps the connections open. When a request is received by the ADC from the client, the request is evaluated and then directed to a server over an existing connection. This has the effect of reducing the overhead imposed by establishing and tearing down the TCP connection with the server, improving the responsiveness of the application. Some ADN implementations take this technique one step further and also multiplex HTTP and application requests. This has the benefit of executing requests in parallel, which enhances the performance of the application. === TCP optimization === There are a number of Request for Comments (RFCs) which describe mechanisms for improving the performance of TCP. Many ADN implement these RFCs in order to provide enhanced delivery of applications through more efficient use of TCP. The RFCs most commonly implemented are: Delayed Acknowledgements Nagle Algorithm Selective Acknowledgements Explicit Congestion Notification ECN Limited and Fast Retransmits Adaptive Initial Congestion Windows === Data compression and caching === ADNs also provide optimization of application data through caching and compression techniques. There are two types of compression used by ADNs today: industry standard HTTP compression and proprietary data reduction algorithms. It is important to note that the cost in CPU cycles to compress data when traversing a LAN can result in a negative performance impact and therefore best practices are to only utilize compression when delivering applications via a WAN or particularly congested high-speed data link. HTTP compression is asymmetric and transparent to the client. Support for HTTP compression is built into web servers and web browsers. All commercial ADN products currently support HTTP compression. A second compression technique is achieved through data reduction algorithms. Because these algorithms are proprietary and modify the application traffic, they are symmetric and require a device to reassemble the application traffic before the client can receive it. A separate class of devices known as WAN Optimization Controllers (WOC) provide this functionality, but the technology has been slowly added to the ADN portfolio over the past few years as this class of device continues to become more application aware, providing additional features for specific applications such as CIFS and SMB. == ADN reliability and availability techniques == === Advanced health checking === Advanced health checking is the ability of an ADN to determine not only the state of the server on which an application is hosted, but the status of the application it is delivering. Advanced health checking techniques allow the ADC to intelligently determine whether or not the content being returned by the server is correct and should be delivered to the client. This feature enables other reliability features in the ADN, such as resending a request to a different server if the content returned by the original server is found to be erroneous. === Load balancing algorithms === The load balancing algorithms found in today's ADN are far more advanced than the simplistic round-robin and least connections algorithms used in the early 1990s. These algorithms were originally loosely based on operating systems' scheduling algorithms, but have since evolved to factor in conditions peculiar to networking and application environments. It is more accurate to describe today's "load balancing" algorithms as application routing algorithms, as most ADN employ application awareness to determine whether an application is available to respond to a request. This includes the ability of the ADN to determine not only whether the application is available, but whether or not the application can respond to the request within specified parameters, often referred to as a service level agreement. Typical industry standard load balancing algorithms available today include: Round Robin Least Connections Fastest Response Time Weighted Round Robin Weighted Least Connections Custom values assigned to individual servers in a pool based on SNMP or other communication mechanism === Fault tolerance === The ADN provides fault tolerance at the server level, within pools or farms. This is accomplished by designating specific servers as a 'backup' that is activated automatically by the ADN in the event that the primary server(s) in the pool fail. The ADN also ensures application availability and reliability through its ability to seamlessly "failover"

    Read more →
  • Strong cryptography

    Strong cryptography

    Strong cryptography or cryptographically strong are general terms used to designate the cryptographic algorithms that, when used correctly, provide a very high (usually insurmountable) level of protection against any eavesdropper, including the government agencies. There is no precise definition of the boundary line between the strong cryptography and (breakable) weak cryptography, as this border constantly shifts due to improvements in hardware and cryptanalysis techniques. These improvements eventually place the capabilities once available only to the NSA within the reach of a skilled individual, so in practice there are only two levels of cryptographic security, "cryptography that will stop your kid sister from reading your files, and cryptography that will stop major governments from reading your files" (Bruce Schneier). The strong cryptography algorithms have high security strength, for practical purposes usually defined as a number of bits in the key. For example, the United States government, when dealing with export control of encryption, considered as of 1999 any implementation of the symmetric encryption algorithm with the key length above 56 bits or its public key equivalent to be strong and thus potentially a subject to the export licensing. To be strong, an algorithm needs to have a sufficiently long key and be free of known mathematical weaknesses, as exploitation of these effectively reduces the key size. At the beginning of the 21st century, the typical security strength of the strong symmetrical encryption algorithms is 128 bits (slightly lower values still can be strong, but usually there is little technical gain in using smaller key sizes). Demonstrating the resistance of any cryptographic scheme to attack is a complex matter, requiring extensive testing and reviews, preferably in a public forum. Good algorithms and protocols are required (similarly, good materials are required to construct a strong building), but good system design and implementation is needed as well: "it is possible to build a cryptographically weak system using strong algorithms and protocols" (just like the use of good materials in construction does not guarantee a solid structure). Many real-life systems turn out to be weak when the strong cryptography is not used properly, for example, random nonces are reused A successful attack might not even involve algorithm at all, for example, if the key is generated from a password, guessing a weak password is easy and does not depend on the strength of the cryptographic primitives. A user can become the weakest link in the overall picture, for example, by sharing passwords and hardware tokens with the colleagues. == Background == The level of expense required for strong cryptography originally restricted its use to the government and military agencies, until the middle of the 20th century the process of encryption required a lot of human labor and errors (preventing the decryption) were very common, so only a small share of written information could have been encrypted. US government, in particular, was able to keep a monopoly on the development and use of cryptography in the US into the 1960s. In the 1970, the increased availability of powerful computers and unclassified research breakthroughs (Data Encryption Standard, the Diffie-Hellman and RSA algorithms) made strong cryptography available for civilian use. Mid-1990s saw the worldwide proliferation of knowledge and tools for strong cryptography. By the 21st century the technical limitations were gone, although the majority of the communication were still unencrypted. At the same the cost of building and running systems with strong cryptography became roughly the same as the one for the weak cryptography. The use of computers changed the process of cryptanalysis, famously with Bletchley Park's Colossus. But just as the development of digital computers and electronics helped in cryptanalysis, it also made possible much more complex ciphers. It is typically the case that use of a quality cipher is very efficient, while breaking it requires an effort many orders of magnitude larger - making cryptanalysis so inefficient and impractical as to be effectively impossible. == Cryptographically strong algorithms == This term "cryptographically strong" is often used to describe an encryption algorithm, and implies, in comparison to some other algorithm (which is thus cryptographically weak), greater resistance to attack. But it can also be used to describe hashing and unique identifier and filename creation algorithms. See for example the description of the Microsoft .NET runtime library function Path.GetRandomFileName. In this usage, the term means "difficult to guess". An encryption algorithm is intended to be unbreakable (in which case it is as strong as it can ever be), but might be breakable (in which case it is as weak as it can ever be) so there is not, in principle, a continuum of strength as the idiom would seem to imply: Algorithm A is stronger than Algorithm B which is stronger than Algorithm C, and so on. The situation is made more complex, and less subsumable into a single strength metric, by the fact that there are many types of cryptanalytic attack and that any given algorithm is likely to force the attacker to do more work to break it when using one attack than another. There is only one known unbreakable cryptographic system, the one-time pad, which is not generally possible to use because of the difficulties involved in exchanging one-time pads without them being compromised. So any encryption algorithm can be compared to the perfect algorithm, the one-time pad. The usual sense in which this term is (loosely) used, is in reference to a particular attack, brute force key search — especially in explanations for newcomers to the field. Indeed, with this attack (always assuming keys to have been randomly chosen), there is a continuum of resistance depending on the length of the key used. But even so there are two major problems: many algorithms allow use of different length keys at different times, and any algorithm can forgo use of the full key length possible. Thus, Blowfish and RC5 are block cipher algorithms whose design specifically allowed for several key lengths, and who cannot therefore be said to have any particular strength with respect to brute force key search. Furthermore, US export regulations restrict key length for exportable cryptographic products and in several cases in the 1980s and 1990s (e.g., famously in the case of Lotus Notes' export approval) only partial keys were used, decreasing 'strength' against brute force attack for those (export) versions. More or less the same thing happened outside the US as well, as for example in the case of more than one of the cryptographic algorithms in the GSM cellular telephone standard. The term is commonly used to convey that some algorithm is suitable for some task in cryptography or information security, but also resists cryptanalysis and has no, or fewer, security weaknesses. Tasks are varied, and might include: generating randomness encrypting data providing a method to ensure data integrity Cryptographically strong would seem to mean that the described method has some kind of maturity, perhaps even approved for use against different kinds of systematic attacks in theory and/or practice. Indeed, that the method may resist those attacks long enough to protect the information carried (and what stands behind the information) for a useful length of time. But due to the complexity and subtlety of the field, neither is almost ever the case. Since such assurances are not actually available in real practice, sleight of hand in language which implies that they are will generally be misleading. There will always be uncertainty as advances (e.g., in cryptanalytic theory or merely affordable computer capacity) may reduce the effort needed to successfully use some attack method against an algorithm. In addition, actual use of cryptographic algorithms requires their encapsulation in a cryptosystem, and doing so often introduces vulnerabilities which are not due to faults in an algorithm. For example, essentially all algorithms require random choice of keys, and any cryptosystem which does not provide such keys will be subject to attack regardless of any attack resistant qualities of the encryption algorithm(s) used. == Legal issues == Widespread use of encryption increases the costs of surveillance, so the government policies aim to regulate the use of the strong cryptography. In the 2000s, the effect of encryption on the surveillance capabilities was limited by the ever-increasing share of communications going through the global social media platforms, that did not use the strong encryption and provided governments with the requested data. Murphy talks about a legislative balance that needs to be struck between the power of the government that are broad enough to be able to follow the qui

    Read more →
  • Virtual influencer

    Virtual influencer

    A virtual influencer, sometimes described as a virtual persona or virtual model, is a computer-generated fictional character that can be used for a variety of marketing-related purposes, but most frequently for social media marketing, in lieu of online human "influencers". Most virtual influencers are designed using computer graphics and motion capture technology to resemble real people in realistic situations. Common derivatives of virtual influencers include VTubers, which broadly refer to online entertainers and YouTubers who represent themselves using virtual avatars instead of their physical selves. == History == Virtual influencers are fundamentally synonymous with virtual idols, which originate from Japan's anime and Japanese idol culture that dates back to the 1980s. The first virtual idol created was Lynn Minmay, a fictional singer and main character of the anime television series Super Dimension Fortress Macross (1982) and the animated film adaptation Macross: Do You Remember Love? (1984). Minmay's success led to the production of more Japanese virtual idols, such as EVE from the Japanese cyberpunk anime Megazone 23 (1985), and Sharon Apple in Macross Plus (1994). Virtual idols were not always well received – in 1995, Japanese talent agency Horipro created Kyoko Date, which was inspired by the Macross franchise and dating sim games such as Tokimeki Memorial (1994). Date failed to gain commercial success despite drawing headlines for her debut as a CGI idol, largely due to technical limitations leading to issues such as unnatural movements, an issue also known as the uncanny valley. Since their inception, many virtual idols created have achieved continual success, with notable names including the Vocaloid singer Hatsune Miku, and the VTuber Kizuna AI. Technological advancements have also enabled production teams to use artificial intelligence and advanced techniques to customize the personalities and behavior of virtual idols. Due to modern-day advancements in technology, many virtual idols have held real-life tours and events. Notable ones include Hatsune Miku's titular tour Miku Expo and Hololive's concerts with many of their idols from their English, Japanese and Indonesian branches. Some notable events including virtual singers and influencers have included: Hatsune Miku opening for Lady Gaga in 2014 and Hoshimachi Suisei's concerts at the famous Budokan venue in Japan and her addition to the Forbes Japan list of '30 Under 30' individuals who are changing the world in their respective fields. == Benefits and criticism == From a branding perspective, virtual influencers are perceived to be much less likely to be mired in scandals. In China, celebrities caught in bad publicity such as singer Wang Leehom and entertainer Kris Wu have heightened the appeal of virtual influencers, since their existence relies entirely on computer-generated imagery and they are therefore unlikely to cause any damage to a brand's image by association. Some studies have also suggested that Generation Z consumers have a unique appetite for virtual idols and influencers, since they grew up in the age of the internet. Studies also show that human-like appearance of virtual influencers show higher message credibility than anime-like virtual influencers. Scholars and commentators have also questioned the ethics and cultural impact of virtual influencers, arguing that computer-generated personas can entrench unrealistic beauty standards while diffusing accountability for labor, identity, and consent. Business and marketing analysts have also warned that disclosure and governance remain inconsistent, recommending clearer guardrails and transparency when brands deploy synthetic spokespeople. In 2025, reporting highlighted concerns that AI-driven "virtual humans" could displace human creators and sales workers, intensifying debates over the future of creative labor and authenticity online. == Notable examples == === Virtual bands === Eternity - A South Korean virtual idol group formed by Pulse9. Gorillaz - A virtual band formed in 1998. K/DA - A virtual K-pop girl group created as part of the League of Legends video game franchise. MAVE: - A South Korean virtual girl group formed in 2023 by Metaverse Entertainment. Pentakill - A virtual heavy metal band created as part of the League of Legends video game franchise. Plave (band) - A South Korean virtual boy band formed by VLast. Squid Sisters and Off the Hook - Two virtual pop idol duos as part of the Splatoon series. Studio Killers - A Finnish-Danish-British virtual band formed in 2011. === Vocaloids === Hatsune Miku (modeled after Saki Fujita) Kagamine Rin/Len (modeled after Asami Shimoda) Megurine Luka (modeled after Yū Asakawa) Meiko (modeled after Meiko Haigō) Kaito (modeled after Naoto Fūga) === VTubers === Kano Kizuna AI Neuro-sama VShojo Ironmouse Projekt Melody Nijisanji Hololive Akai Haato Gawr Gura Hoshimachi Suisei Natsuiro Matsuri === Other examples === Ami Yamato Crazy Frog FN Meka IA Kuki AI Kyoko Date Kyra Miquela Naevis Shudu Gram

    Read more →
  • Image

    Image

    An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a projection on a surface, activation of electronic signals, or digital displays; they can also be reproduced through mechanical means, such as photography, printmaking, or photocopying. Images can also be animated through digital or physical processes. In the context of signal processing, an image is a distributed amplitude of color(s). In optics, the term image (or optical image) refers specifically to the reproduction of an object formed by light waves coming from the object. A volatile image exists or is perceived only for a short period. This may be a reflection of an object by a mirror, a projection of a camera obscura, or a scene displayed on a cathode-ray tube. A fixed image, also called a hard copy, is one that has been recorded on a material object, such as paper or textile. A mental image exists in an individual's mind as something one remembers or imagines. The subject of an image does not need to be real; it may be an abstract concept such as a graph or function or an imaginary entity. For a mental image to be understood outside of an individual's mind, however, there must be a way of conveying that mental image through the words or visual productions of the subject. == Characteristics == === Two-dimensional images === The broader sense of the word 'image' also encompasses any two-dimensional figure, such as a map, graph, pie chart, painting, or banner. In this wider sense, images can also be rendered manually, such as by drawing, the art of painting, or the graphic arts (such as lithography or etching). Additionally, images can be rendered automatically through printing, computer graphics technology, or a combination of both methods. A two-dimensional image does not need to use the entire visual system to be a visual representation. An example of this is a grayscale ("black and white") image, which uses the visual system's sensitivity to brightness across all wavelengths without taking into account different colors. A black-and-white visual representation of something is still an image, even though it does not fully use the visual system's capabilities. On the other hand, some processes can be used to create visual representations of objects that are otherwise inaccessible to the human visual system. These include microscopy for the magnification of minute objects, telescopes that can observe objects at great distances, X-rays that can visually represent the interior structures of the human body (among other objects), magnetic resonance imaging (MRI), positron emission tomography (PET scans), and others. Such processes often rely on detecting electromagnetic radiation that occurs beyond the light spectrum visible to the human eye and converting such signals into recognizable images. === Three-dimensional images === Aside from sculpture and other physical activities that can create three-dimensional images from solid material, some modern techniques, such as holography, can create three-dimensional images that are reproducible but intangible to human touch. Some photographic processes can now render the illusion of depth in an otherwise "flat" image, but "3-D photography" (stereoscopy) or "3-D film" are optical illusions that require special devices such as eyeglasses to create the illusion of depth. === Moving images === "Moving" two-dimensional images are actually illusions of movement perceived when still images are displayed in sequence, each image lasting less, and sometimes much less, than a fraction of a second. The traditional standard for the display of individual frames by a motion picture projector has been 24 frames per second (FPS) since at least the commercial introduction of "talking pictures" in the late 1920s, which necessitated a standard for synchronizing images and sounds. Even in electronic formats such as television and digital image displays, the apparent "motion" is actually the result of many individual lines giving the impression of continuous movement. This phenomenon has often been described as "persistence of vision": a physiological effect of light impressions remaining on the retina of the eye for very brief periods. Even though the term is still sometimes used in popular discussions of movies, it is not a scientifically valid explanation. Other terms emphasize the complex cognitive operations of the brain and the human visual system. "Flicker fusion", the "phi phenomenon", and "beta movement" are among the terms that have replaced "persistence of vision", though no one term seems adequate to describe the process. == Cultural and other uses == Image-making seems to have been common to virtually all human cultures since at least the Paleolithic era. Prehistoric examples of rock art—including cave paintings, petroglyphs, rock reliefs, and geoglyphs—have been found on every inhabited continent. Many of these images seem to have served various purposes: as a form of record-keeping; as an element of spiritual, religious, or magical practice; or even as a form of communication. Early writing systems, including hieroglyphics, ideographic writing, and even the Roman alphabet, owe their origins in some respects to pictorial representations. === Meaning and signification === Images of any type may convey different meanings and sensations for individual viewers, regardless of whether the image's creator intended them. An image may be taken simply as a more or less "accurate" copy of a person, place, thing, or event. It may represent an abstract concept, such as the political power of a ruler or ruling class, a practical or moral lesson, an object for spiritual or religious veneration, or an object—human or otherwise—to be desired. It may also be regarded for its purely aesthetic qualities, rarity, or monetary value. Such reactions can depend on the viewer's context. A religious image in a church may be regarded differently than the same image mounted in a museum. Some might view it simply as an object to be bought or sold. Viewers' reactions will also be guided or shaped by their education, class, race, and other contexts. The study of emotional sensations and their relationship to any given image falls into the categories of aesthetics and the philosophy of art. While such studies inevitably deal with issues of meaning, another approach to signification was suggested by the American philosopher, logician, and semiotician Charles Sanders Peirce. "Images" are one type of the broad category of "signs" proposed by Peirce. Although his ideas are complex and have changed over time, the three categories of signs that he distinguished stand out: The "icon," which relates to an object by resemblance to some quality of the object. A painted or photographed portrait is an icon by virtue of its resemblance to the painting's or photograph's subject. A more abstract representation, such as a map or diagram, can also be an icon. The "index," which relates to an object by some real connection. For example, smoke may be an index of fire, or the temperature recorded on a thermometer may be an index of a patient's illness or health. The "symbol," which lacks direct resemblance or connection to an object but whose association is arbitrarily assigned by the creator or dictated by cultural and historical habit, convention, etc. The color red, for example, may connote rage, beauty, prosperity, political affiliation, or other meanings within a given culture or context; the Swedish film director Ingmar Bergman claimed that his use of the color in his 1972 film Cries and Whispers came from his personal visualization of the human soul. A single image may exist in all three categories at the same time. The Statue of Liberty provides an example. While there have been countless two-dimensional and three-dimensional "reproductions" of the statue (i.e., "icons" themselves), the statue itself exists as an "icon" by virtue of its resemblance to a human woman (or, more specifically, previous representations of the Roman goddess Libertas or the female model used by the artist Frederic-Auguste Bartholdi). an "index" representing New York City or the United States of America in general due to its placement in New York Harbor, or with "immigration" from its proximity to the immigration center at Ellis Island. a "symbol" as a visualization of the abstract concept of "liberty" or "freedom" or even "opportunity" or "diversity". === Critiques of imagery === The nature of images, whether three-dimensional or two-dimensional, created for a specific purpose or only for aesthetic pleasure, has continued to provoke questions and even condemnation at different times and places. In his dialogue, The Republic, the Greek philosopher Plato described our apparent reality as a copy of a higher order of universal forms.

    Read more →
  • Majal (organization)

    Majal (organization)

    Majal is a regional not-for-profit organization focused on "amplifying voices of dissent" throughout the Middle East and North Africa via digital media. Founded in Bahrain, the organization "creates platforms and web applications that promote freedom of expression and social justice." Majal, which relies on open source platforms, like WordPress and Ruby on Rails, was launched in 2006 by Esra'a Al Shafei as a simple group-blogging idea. However, it has changed course to focus on the development of unique applications and tools. == Objectives and means == Majal's content, in addition to its projects and applications, is free open source content to ensure right to access information for everyone. The organization uses a broad spectrum of social media tools, ranging from written blogs, podcasts, vlogs, comics, video animation and pictures to live broadcasting through radio. == Projects and applications == Majal runs various active projects that include Alliance for Kurdish Rights, The Muslim Network for Baháʼí Rights, a discussion tool for Arab LGBT youth and various Mobile apps. == Funding == Majal is funded through private donations and grants from non-governmental organizations, as well as any potential revenues earned through freelance development. Its primary funders are the Shuttleworth Foundation and the Omidyar Network. In 2008, Majal won the Berkman Award from the Berkman Klein Center for Internet & Society at Harvard University in the Human Rights/Global Advocacy category. This $10,000 award was Majal’s first source of funding. This award is presented to “people or institutions that have made a significant contribution to the Internet and its impact on society over the past decade.” In 2009, the March 18 Movement, a project of Majal, received the Think Social Award, which demonstrates how social media can be used to solve the world’s problems. Esra'a Al-Shafei was named a 2009 Echoing Green Fellow for Civil and Human Rights, a seed funding award for young entrepreneurs engaged in social change. Financially, the fellowship consists of a $60,000 stipend paid over two years. Most recently, MEY has received a grant from the Arab Fund for Arts and Culture for its Mideast Tunes website. == Awards == Winner of Human Rights Tulip 2014 Human Rights Tulip - Human rights - Government.nl Ashoka Changemakers Citizen Media competition in 2011 for their CrowdVoice project. Monaco Media Prize 2011 for Majal founder and director Esra'a Al Shafei in 2011. The BOBS Special Topic Human Rights award in 2011 for the Majal website Migrant Rights. ThinkSocial Award in 2009, as powerful model for how social media can be used to address global problems. Echoing Green, 2009 Fellowship. TEDGlobal 2009 Fellowship. Berkman Award for Internet Innovation from Berkman Klein Center for Internet & Society at Harvard Law School in 2008 for the outstanding contributions to the internet and its impact on society. The Global Journal selected Majal as one of the Top 100 NGOs in 2013. 2013-2014 Shuttleworth Foundation Fellowship. == Leadership == Majal team is led primarily by women. The organization was founded by Esra'a Al Shafei, a blogger from Bahrain in 2006. Ahmed Zidan of Egypt has served for over three years as the Editor-in-Chief of Majal Arabic, and is the co-founder of Ahwaa, and is also a podcaster. Other team members include Mona Kareem, Rima Kalush, Abir Ghattas, Namita Malhotra, and Vani Saraswathi. == 2011 Middle East and North Africa protests == Blogs and video played a role in the documentation of protests throughout the Middle East and North Africa during 2010-2011, also known as the Arab Spring. During this period, MEY's project, CrowdVoice (launched in 2010) helped curate and archive the large amounts of videos, images, and eye-witness reports being aggregated and crowdsourced from across the region. As a result, it had been censored temporarily in Yemen and is still censored in Bahrain. == Media coverage == Majal claims to have received various coverage from news agencies, TV satellite channels, radio stations, newspapers, magazines. For instance, Sky News, CNN, New York Times, BBC, The Guardian, NPR, Time, MTV political blog "Act", VH1, Daily Telegraph, Die Zeit, Frankfurter Rundschau FR-online, Toronto Star, TechCrunch, Rolling Stone Middle East, Abu Dhabi TV, Gulf News, Al-Hasnaa' magazine, ReadWriteWeb, Mashable, The Next Web, Radio Sawt Beirut International, Radio Farda among many others.

    Read more →
  • Data profiling

    Data profiling

    Data profiling is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data. The purpose of these statistics may be to: Find out whether existing data can be easily used for other purposes Improve the ability to search data by tagging it with keywords, descriptions, or assigning it to a category Assess data quality, including whether the data conforms to particular standards or patterns Assess the risk involved in integrating data in new applications, including the challenges of joins Discover metadata of the source database, including value patterns and distributions, key candidates, foreign-key candidates, and functional dependencies Assess whether known metadata accurately describes the actual values in the source database Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns. Have an enterprise view of all data, for uses such as master data management, where key data is needed, or data governance for improving data quality. == Introduction == Data profiling refers to the analysis of information for use in a data warehouse in order to clarify the structure, content, relationships, and derivation rules of the data. Profiling helps to not only understand anomalies and assess data quality, but also to discover, register, and assess enterprise metadata. The result of the analysis is used to determine the suitability of the candidate source systems, usually giving the basis for an early go/no-go decision, and also to identify problems for later solution design. == How data profiling is conducted == Data profiling utilizes methods of descriptive statistics such as minimum, maximum, mean, mode, percentile, standard deviation, frequency, variation, aggregates such as count and sum, and additional metadata information obtained during data profiling such as data type, length, discrete values, uniqueness, occurrence of null values, typical string patterns, and abstract type recognition. The metadata can then be used to discover problems such as illegal values, misspellings, missing values, varying value representation, and duplicates. Different analyses are performed for different structural levels. E.g. single columns could be profiled individually to get an understanding of frequency distribution of different values, type, and use of each column. Embedded value dependencies can be exposed in a cross-columns analysis. Finally, overlapping value sets possibly representing foreign key relationships between entities can be explored in an inter-table analysis. Normally, purpose-built tools are used for data profiling to ease the process. The computational complexity increases when going from single column, to single table, to cross-table structural profiling. Therefore, performance is an evaluation criterion for profiling tools. == When is data profiling conducted? == According to Kimball, data profiling is performed several times and with varying intensity throughout the data warehouse developing process. A light profiling assessment should be undertaken immediately after candidate source systems have been identified and DW/BI business requirements have been satisfied. The purpose of this initial analysis is to clarify at an early stage if the correct data is available at the appropriate detail level and that anomalies can be handled subsequently. If this is not the case the project may be terminated. Additionally, more in-depth profiling is done prior to the dimensional modeling process in order assess what is required to convert data into a dimensional model. Detailed profiling extends into the ETL system design process in order to determine the appropriate data to extract and which filters to apply to the data set. Additionally, data profiling may be conducted in the data warehouse development process after data has been loaded into staging, the data marts, etc. Conducting data at these stages helps ensure that data cleaning and transformations have been done correctly and in compliance of requirements. == Benefits and examples == Data profiling can improve data quality, shorten the implementation cycle of major projects, and improve users' understanding of data. Discovering business knowledge embedded in data itself is one of the significant benefits derived from data profiling. It can improve data accuracy in corporate databases.

    Read more →