Loab

Loab

Loab ( LOBE) is a fictional character that artist and writer Steph Maj Swanson claimed to have discovered with a text-to-image AI model in April 2022. In a viral Twitter thread, Swanson described the images of Loab as an unexpectedly emergent property of the software, saying they discovered them when asking the model to produce something "as different from the prompt as possible". == History == The Sweden-based artist Steph Maj Swanson said that they first generated these images in April 2022 by using the algorithmic technique of "negative prompt weights" accessing latent space. The initial prompt - 'Brando::-1', requesting the opposite of actor Marlon Brando - generated a "skyline logo" with the cryptic lettering "DIGITA PNTICS". Attempting to generate the opposite of this image using the prompt "DIGITA PNTICS skyline logo::-1" yielded what Swanson described as "off-putting images, all of the same devastated-looking older woman with defined triangles of rosacea(?) on her cheeks". Swanson nicknamed the character "Loab", after one of the generated images resembled an album cover that included the printed word "loab". Swanson says that using the image as a prompt for further images produced increasingly violent and gory results. Swanson speculated that something about the image could be "adjacent to extremely gory and macabre imagery in the distribution of the AI's world knowledge". Swanson says that when they combined images of Loab with other pictures, the subsequent results consistently return an image including Loab, regardless of how much distortion they added to the prompts to try and remove her visage. Swanson speculated that the latent space region of the AI map that Loab is located in, in addition to being near gruesome imagery, must be isolated enough that any combinations with other images could only use Loab from her area and no related images due to its isolation. After enough crossbreeding of images and dilution attempts, Swanson was able to eventually generate images without Loab, but found that crossbreeding those diluted images would also eventually lead to a version of Loab to reappear in the resulting images. Swanson has said that "for various reasons" they declined to disclose the software used to create the images. Loab has been referred to as the "first AI-generated cryptid" and as such has gone viral. Despite hyping up the cryptid nature of the discovery in their wording, Swanson admitted that "Loab isn't really haunted, of course", but noted that the mythos that has sprung up around the AI-generated character has gone beyond their initial involvement. Swanson speculated that people sharing pictures and memes of Loab would lead future AIs to use those images as a part of their latent space maps, making her an innate part of the internet landscape, with Swanson adding "If we want to get rid of her, it's already too late." == Response == There has been discussion of whether the Loab series of images are "a legitimate quirk of AI art software, or a cleverly disguised creepypasta." Smithsonian magazine has written that "Loab sparked some lengthy ethical conversations around visual aesthetics, art and technology," and some have criticized the labeling of a woman with rosacea as a horror image, considering this to be "stigmatizing disability". Swanson responded that if the AI map is combining Loab with violent imagery, then that is a "social bias" in the data being used for the image modeling software. The Atlantic writer Stephen Marche described Loab as a "form of expression that has never existed before" whose authorship is unclear and that exists as an "emanation of the collective imagistic heritage, the unconscious visual mind". Laurens Verhagen in de Volkskrant commented that rather than showing that there are "dark horror creatures hidden deep within AI", the existence of Loab instead implies that our current "understanding of AI is limited". Mhairi Aitken at the Alan Turing Institute stated that rather than a "creepy" emergent property, output results like Loab were representative of the "limitations of AI image-generator models" and was more concerned about the urban legends that are born from such "boring" innocuous things and how easily "other people take these things seriously". Carly Cassella for ScienceAlert described Loab as a "modern day tronie" (a style of Dutch painting) that is not representative of an actual person, but just a concept or idea, similar but distinct from works like the Girl With A Pearl Earring. Wired's Joel Warner argued that Loab was only the beginning and that, with AI text generators such as ChatGPT becoming more commonplace, a "linguistic version of Loab" would emerge in that space as well and begin creating ideas through "intentional prompts" or otherwise that will be as disturbing as The 120 Days of Sodom.

Cloud management

Cloud management refers to the administration and oversight of cloud computing products and services. Public clouds are managed by cloud service providers, which operate the underlying infrastructure such as servers, storage, networking, and data center facilities. Users may also opt to manage their public cloud services with a third-party cloud management tool. Users of public cloud services can generally select from three basic cloud provisioning categories: User self-provisioning: Customers purchase cloud services directly from the provider, typically through a web form or console interface. The customer pays on a per-transaction basis. Advanced provisioning: Customers contract in advance a predetermined amount of resources, which are prepared in advance of service. The customer pays a flat fee or a monthly fee. Dynamic provisioning: The provider allocates resources when the customer needs them, then decommissions them when they are no longer needed. The customer is charged on a pay-per-use basis. Managing a private cloud requires software tools to help create a virtualized pool of compute resources, provide a self-service portal for end users and handle security, resource allocation, tracking and billing. Management tools for private clouds tend to be service driven, as opposed to resource driven, because cloud environments are typically highly virtualized and organized in terms of portable workloads. In hybrid cloud environments, compute, network and storage resources must be managed across multiple domains, so a good management strategy should start by defining what needs to be managed, and where and how to do it. Policies to help govern these domains should include configuration and installation of images, access control, and budgeting and reporting. Access control often includes the use of Single sign-on (SSO), in which a user logs in once and gains access to all systems without being prompted to log in again at each of them. == Characteristics of Cloud Management == Cloud management combines software and technologies in a design for managing cloud environments. Software developers have responded to the management challenges of cloud computing with a variety of cloud management platforms and tools. These tools include native tools offered by public cloud providers as well as third-party tools designed to provide consistent functionality across multiple cloud providers. Administrators must balance the competing requirements of efficient consistency across different cloud platforms with access to different native functionality within individual cloud platforms. The growing acceptance of public cloud and increased multicloud usage is driving the need for consistent cross-platform management. Rapid adoption of cloud services is introducing a new set of management challenges for those technical professionals responsible for managing IT systems and services. Cloud-management platforms and tools should have the ability to provide minimum functionality in the following categories. Functionality can be both natively provided or orchestrated via third-party integration. Provisioning and orchestration: create, modify, and delete resources as well as orchestrate workflows and management of workloads Automation: Enable cloud consumption and deployment of app services via infrastructure-as-code and other DevOps concepts Security and compliance: manage role-based access of cloud services and enforce security configurations Service request: collect and fulfill requests from users to access and deploy cloud resources. Monitoring and logging: collect performance and availability metrics as well as automate incident management and log aggregation Inventory and classification: discover and maintain pre-existing brownfield cloud resources plus monitor and manage changes Cost management and optimization: track and rightsize cloud spend and align capacity and performance to actual demand Migration, backup, and DR: enable data protection, disaster recovery, and data mobility via snapshots and/or data replication Organizations may group these criteria into key use cases including Cloud Brokerage, DevOps Automation, Governance, and Day-2 Life Cycle Operations. Enterprises with large-scale cloud implementations may require more robust cloud management tools which include specific characteristics, such as the ability to manage multiple platforms from a single point of reference, or intelligent analytics to automate processes like application lifecycle management. High-end cloud management tools should also have the ability to handle system failures automatically with capabilities such as self-monitoring, an explicit notification mechanism, and include failover and self-healing capabilities. == Multi-Cloud and Hybrid Cloud Management Challenges == Legacy management infrastructures, which are based on the concept of dedicated system relationships and architecture constructs, are not well suited to cloud environments where instances are continually launched and decommissioned. Instead, the dynamic nature of cloud computing requires monitoring and management tools that are adaptable, extensible and customizable. Cloud computing presents a number of management challenges. Companies using public clouds do not have ownership of the equipment hosting the cloud environment, and because the environment is not contained within their own networks, public cloud customers do not have full visibility or control. Users of public cloud services must also integrate with an architecture defined by the cloud provider, using its specific parameters for working with cloud components. Integration includes tying into the cloud APIs for configuring IP addresses, subnets, firewalls and data service functions for storage. Because control of these functions is based on the cloud provider’s infrastructure and services, public cloud users must integrate with the cloud infrastructure management. Capacity management is a challenge for both public and private cloud environments because end users have the ability to deploy applications using self-service portals. Applications of all sizes may appear in the environment, consume an unpredictable amount of resources, then disappear at any time. A possible solution is profiling the applications impact on computational resources. As result, the performance models allow the prediction of how resource utilization changes according to application patterns. Thus, resources can be dynamically scaled to meet the expected demand. This is critical to cloud providers that need to provision resources quickly to meet a growing demand by their applications. Charge-back—or, pricing resource use on a granular basis—is a challenge for both public and private cloud environments. Charge-back is a challenge for public cloud service providers because they must price their services competitively while still creating profit. Users of public cloud services may find charge-back challenging because it is difficult for IT groups to assess actual resource costs on a granular basis due to overlapping resources within an organization that may be paid for by an individual business unit, such as electrical power. For private cloud operators, charge-back is fairly straightforward, but the challenge lies in guessing how to allocate resources as closely as possible to actual resource usage to achieve the greatest operational efficiency. Exceeding budgets can be a risk. Hybrid cloud environments, which combine public and private cloud services, sometimes with traditional infrastructure elements, present their own set of management challenges. These include security concerns if sensitive data lands on public cloud servers, budget concerns around overuse of storage or bandwidth and proliferation of mismanaged images. Managing the information flow in a hybrid cloud environment is also a significant challenge. On-premises clouds must share information with applications hosted off-premises by public cloud providers, and this information may change constantly. Hybrid cloud environments also typically include a complex mix of policies, permissions and limits that must be managed consistently across both public and private clouds. == Cloud Management Platforms (CMP) == CMPs provide a means for a cloud service customer to manage the deployment and operation of applications and associated datasets across multiple cloud service infrastructures, including both on-premises cloud infrastructure and public cloud service provider infrastructure. In other words, CMPs provide management capabilities for hybrid cloud and multi-cloud environments. A cloud management platform (CMP) provides broad cloud management functionality atop both public cloud provider platforms and private cloud platforms. CMPs manage cloud services and resources that are distributed across multiple cloud platforms. The value of CMPs stands in delivering the maximum level of consistency between platforms without comp

GPU switching

GPU switching is a mechanism used on computers with multiple graphic controllers. This mechanism allows the user to either maximize the graphic performance or prolong battery life by switching between the graphic cards. It is mostly used on gaming laptops which usually have an integrated graphic device and a discrete video card. == Basic components == Most computers using this feature contain integrated graphics processors and dedicated graphics cards that applies to the following categories. === Integrated graphics === Also known as: Integrated graphics, shared graphics solutions, integrated graphics processors (IGP) or unified memory architecture (UMA). This kind of graphics processors usually have much fewer processing units and share the same memory with the CPU. Sometimes the graphics processors are integrated onto a motherboard. It is commonly known as: on-board graphics. A motherboard with on-board graphics processors doesn't require a discrete graphics card or a CPU with graphics processors to operate. === Dedicated graphics cards === Also known as: discrete graphics cards. Unlike integrated graphics, dedicated graphics cards have much more processing units and have its own RAM with much higher memory bandwidth. In some cases, a dedicated graphics chip can be integrated onto the motherboards, B150-GP104 for example. Regardless of the fact that the graphics chip is integrated, it is still counted as a dedicated graphics cards system because the graphics chip is integrated with its own memory. == Theory == Most Personal Computers have a motherboard that uses a Southbridge and Northbridge structure. === Northbridge control === The Northbridge is one of the core logic chipset that handles communications between the CPU, GPU, RAM and the Southbridge. The discrete graphics card is usually installed onto the graphics card slot such as PCI-Express and the integrated graphics is integrated onto the CPU itself or occasionally onto the Northbridge. The Northbridge is the most responsible for switching between GPUs. The way how it works usually has the following process (refer to the Figure 1. on the right): The Northbridge receives input from Southbridge through the internal bus. The Northbridge signals to CPU through the Front-side bus. The CPU runs the task assignment application (usually the graphics card driver) to determine which GPU core to use. The CPU passes down the command to the Northbridge. The Northbridge passes down the command to the according GPU core. The GPU core processes the command and returns the rendered data back to the Northbridge. The Northbridge sends the rendered data back to Southbridge. === Southbridge control === The Southbridge is a set of integrated circuits such Intel's I/O Controller Hub (ICH). It handles all of a computer's I/O functions, such as receiving the keyboard input and outputting the data onto the screen. The way how it usually works usually has two steps: Take in the user input and pass it down to the Northbridge. (Optional) Receive the rendered data from the Northbridge and output it. The reason why the second step can be optional is that sometimes the rendered the data is outputted directly from the discrete graphics card which is located on the graphics card slot so there is no need to output the data through the Southbridge. == Main purpose == GPU switching is mostly used for saving energy by switching between graphic cards. The dedicated graphics cards consume much more power than integrated graphics but also provides higher 3D performances, which is needed for a better gaming and CAD experience. Following is a list of the TDPs of the most popular CPU with integrated graphics and dedicated graphics cards. The dedicated graphics cards exhibit much higher power consumption than the integrated graphics on both platforms. Disabling them when no heavy graphics processing is needed can significantly lower the power consumption. == Technologies == === Nvidia Optimus === Nvidia Optimus™ is a computer GPU switching technology created by Nvidia that can dynamically and seamlessly switch between two graphic cards based on running programs. === AMD Enduro === AMD Enduro™ is a collective brand developed by AMD that features many new technologies that can significantly save power. It was previously named as: PowerXpress and Dynamic Switchable Graphics (DSG). This technology implements a sophisticated system to predict the potential usage need for graphics cards and switch between graphics cards based on predicted need. This technology also introduces a new power control plan that allows the discrete graphics cards consume no energy when idling. == Manufacturers == === Integrated graphics === In personal computers, the IGP (integrated graphics processors) are mostly manufactured by Intel and AMD and are integrated onto their CPUs. They are commonly known as: Intel HD and Iris Graphics - also called HD series and Iris series AMD Accelerated Processing Unit (APU) - also formerly known as: fusion === Dedicated graphics cards === The most popular dedicated graphics cards are manufactured by AMD and Nvidia. They are commonly known as: AMD Radeon Nvidia GeForce == Drivers and OS support == Most common operating systems have built-in support for this feature. However, the users may download the updated drivers from Nvidia or AMD for better experience. === Windows support === Windows 7 has built-in support for this feature. The system automatically switches between GPUs depending on the program that's running. However, the user may switch the GPUs manually through device manager or power manager. === Linux === Modern Linux systems handle hybrid graphics in two parts: power/control for the inactive GPU, and optional render offloading for individual applications. vga_switcheroo (in the kernel since 2.6.34) coordinates power and mux control on systems with multiple GPUs. It was designed primarily for muxed designs (hardware display switch), and on muxless laptops it is typically used only for power control. A display server restart is no longer required for offloading on muxless systems. DRI PRIME (Mesa) enables per-process render offload on muxless systems: an app renders on the discrete GPU and the integrated GPU presents the result. Users can opt in via the DRI_PRIME environment variable (e.g., DRI_PRIME=1) or desktop integration. On GNOME, the switcheroo-control service exposes the discrete GPU to the shell, adding a “Launch using Discrete Graphics Card” entry to app menus on supported systems (Wayland or Xorg), which invokes render offload under the hood. With the proprietary Nvidia driver, render offload is provided as PRIME Render Offload (supported since driver 435.xx). Distributions commonly ship a helper like prime-run or desktop menu entries that set the required environment for offloading. ==== Notes and limitations (Linux) ==== On muxless systems the internal display is hard-wired to the integrated GPU; the discrete GPU cannot directly drive that panel and instead renders offscreen for composition by the iGPU. External displays connected to the dGPU may allow direct output depending on the laptop’s wiring. Power-saving behavior varies by driver and distro defaults. Some setups need explicit configuration to power down the inactive GPU when idle. Desktop integrations (e.g., GNOME's menu item) simply opt an app into offload; they do not "auto-switch" the whole session. Users can still launch apps on either GPU as needed.

Digital content

Digital content is any content that exists in the form of digital data. Digital content is stored on digital media or analog storage in specific formats. Forms of digital content include information that is digitally broadcast, streamed, or contained in computer files. Viewed narrowly, digital content includes popular media types, while a broader approach considers any type of digital information (e. g. digitally updated weather forecasts, GPS maps, and so on) as digital content. Digital content has increased as more households have accessed the Internet. Expanded access has made it easier for people to receive their news and watch TV online, challenging the popularity of traditional platforms. Increased access to the Internet has also led to the mass publication of digital content through individuals in the form of eBooks, blog posts, and even Facebook posts. == History == At the beginning of the Digital Revolution, computers facilitated the discovery, retrieval, and creation of new information in every field of human knowledge. As information became increasingly more accessible, the Digital Revolution also facilitated the creation of digital content. Despite an evolution to digital technology, which occurred somewhere between the late 1970s, distribution of digital content did not begin until the late 1990s with the rise in popularity of the Internet. In the past, digital content was primarily distributed through computers and the Internet. Methods of distribution are rapidly changing as the Digital Revolution brings new channels, such as mobile apps and eBooks. These new technologies will create challenges for content creators, as they determine the best channel to bring content to their consumers. Despite the benefits, new technologies have created new intellectual property issues. Users can easily share, modify, and redistribute content outside of the creator's control. While new technologies have made digital content available to large audiences, managing copyright and limiting content movement will continue to be an issue that digital content creators face in the future. == Types of digital content == Examples include: Video – Types of video content include home videos, music videos, TV shows, and movies. Many of these can be viewed on websites such as YouTube, Hulu, Paramount+, Disney+, HBO Max, and so on, in which people and companies alike can post content. However, many movies and television shows are not available for free legally, but rather can be purchased from sites such as iTunes and Amazon. Audio – Music is the most common form of audio. Spotify has emerged as a popular way for people to listen to music either over the Internet or from their computer desktop. Digital content in the form of music is also available through Pandora and last.fm, both of which allow listeners to listen to music online for no charge. Images – Photo and image sharing is another example of digital content. Popular sites used for this type of digital content includes Imgur, where people share self-created pictures, Flickr, where people share their photo albums, and DeviantArt, where people share their artwork. Popular apps that are used for images include Instagram and Snapchat. Visual Stories - Stories are a new type of digital content that got introduced by Snapchat. Since then, stories as a format has been introduced in a couple of other platforms such as Facebook and Linkedin. In 2018, Google introduced their AMP Stories, which provides content publishers with a mobile-focused format for delivering news and information as visually rich, tap-through stories. Text - Type of digital content which is available in text or written format. Blog websites which store data in form of textual format. === Paid digital content === In order to have access to more premium digital goods, consumers usually have to pay an upfront charge for digital content, or a subscription based fee. Video – Many licensed videos, such as movies and television shows, require money in order to be viewed or downloaded. Popular services used by many include streaming giant Netflix and Amazon's streaming service, as well as recent notice put forth by the online video platform YouTube. Audio – While songs can be streamed for free, generally in order to download most licensed music, consumers need to purchase songs from web stores, such as the popular iTunes. However, Spotify Premium is emerging as a new model for purchasing digital content on the web: consumers pay a monthly fee to unlimited streaming and downloading from Spotify's music library. According to a report done by IHS Inc. in 2013, the global consumer spending on digital content grew to over $57 billion in 2013, which was up almost 30% from $44 billion in 2012. In past years, the US has always been a leader in consumer expenditure on digital content, but as of 2013, many countries have emerged with great consumer expenditure. South Korea's overall digital spend per capita is now greater than the US. ==== Consolidation ==== According to research firm Ampere Analysis, in 2024, a small group of six media conglomerates; Disney, Comcast, Google, Warner Bros. Discovery, Netflix, and Paramount Global—are poised to dominate the global content market. These companies are projected to account for 51% of all global spending on content, a significant increase from 47% in 2020. Disney, in particular, is a major player, with an estimated $35.8 billion investment in television and film content, representing 14% of global spending. This significant increase, fueled by Disney's full ownership of Hulu, highlights the company's strategic focus on streaming services. A substantial portion of the projected $126 billion global content spending is allocated to streaming platforms. === Non-purchasable digital content === Not all digital content is purchasable, and is simply anything published digitally. This would include: News – in recent years newspapers have attempted to expand their readership by creating access to their newspapers digitally. As of 2012, 39% of readers learned about news from online formats, making news a prevalent form of digital content. Advertisements – as media consumers increasingly use digital formats to watch TV, check the weather, and search for content, advertisements have shifted to digital forms to keep up with their viewership. Advertisements are now being made digitally and placed on sites ranging from Facebook to YouTube. Question and Answer sites – these sites are a type of Internet forum where people can post questions they want answered, or provide responses to previous inquiries. With millions of questions posted each day, anyone has the ability to create content on these sites, so the information provided may not be 100% reliable or accurate. Popular sites include Yahoo! Answers, WikiAnswers and Quora. Web mapping – sites such as MapQuest and Google Maps provide users with map content. These sites give people the ability to quickly look up the location of a landmark and create routes to a destination. Online maps are a form of free content provided by companies such as Google and AOL, serving as much more efficient alternatives to the traditional Thomas Guide. == Business implications == === Digital companies === Digital content businesses can include news, information, and entertainment distributed over the Internet and consumed digitally by both consumers and businesses. Based on revenue, the leading digital businesses are ranked Google, China Mobile, Bloomberg, Reed Elsevier, and Apple. The 50 companies with the highest revenue are split between those offering free and paid digital content, but these top 50 companies combined generate revenue of $150 billion. === Educational opportunities === Programs such as CUNY's Macaulay Honors College in their New Media Lab, run by industry professional Robert Small, is set up to train and introduce students to the various disciplines within the digital content industry. The goal is to offer information and access to professional work opportunities. They also explore within an incubator how to create businesses and start ups within the world of digital content. There are many educational events in support of choosing digital content as a career. === Government support === The Irish government adopted a "Strategy for the Digital Content Industry in Ireland" in 2002.

Digital goods

Digital goods or e-goods are intangible goods that exist in digital form. Examples are Wikipedia articles; digital media, such as e-books, downloadable music, internet radio, internet television and streaming media; fonts, logos, photos and graphics; digital subscriptions; online ads (as purchased by the advertiser); internet coupons; electronic tickets; electronically treated documentation in many different fields; downloadable software (Digital Distribution) and mobile apps; cloud-based applications and online games; virtual goods used within the virtual economies of online games and communities; community access; workbooks; worksheets; planners; e-learning (online courses); webinars, video tutorials, blog posts; cards; patterns; website themes and templates. == Legal concerns about digital goods == Special legal concerns regarding digital goods include copyright infringement and taxation. Also the question of the ownership (versus licensed use or service only) of purely digital goods is not finally resolved. For instance, the software installers of the digital software distributor gog.com are technically independent to the account but are still subject to the EULA, where a "licensed, not sold" formulation is used. Therefore, it is not clear if the software can be legally used after a hypothetical loss of the account; a question which was also raised before in practice for the similar service Steam. In July 2012, the European Court of Justice ruled in the case UsedSoft GMbH v. Oracle International Corp. that the sale of a software product, either through a physical support or download, constituted a transfer of ownership in EU law, thus the first sale doctrine applies; the ruling thereby breaks the "licensed, not sold" legal theory, but leaves open numerous questions. Therefore, it is also permissible to resell software licenses even if the digital good has been downloaded directly from the Internet, as the first-sale doctrine applied whenever software was originally sold to a customer for an unlimited amount of time, thus prohibiting any software maker from preventing the resale of their software by any of their legitimate owners. The court requires that the previous owner must no longer be able to use the licensed software after the resale, but finds that the practical difficulties in enforcing this clause should not be an obstacle to authorizing resale, as they are also present for software which can be installed from physical supports, where the first-sale doctrine is in force. In several cases, content providers have faced criticism for revoking access to digital goods due to expired licenses or the discontinuation of a product, such as ebooks (which resulted in a lawsuit against Amazon.com, Inc.), digital video (with Sony Interactive Entertainment revoking access to purchased StudioCanal content from its now-defunct PlayStation video store; a similar move involving Warner Bros. Discovery content was averted by an updated license agreement), and video games (such as Ubisoft discontinuing and revoking access to its game The Crew without providing refunds or the ability to redownload the game) In September 2024, the U.S. state of California implemented a consumer protection law that prohibits the use of terms such as "buy" or "purchase" during transactions involving digital goods if there is no way to obtain the purchases in a manner that cannot be revoked by the seller (such as allowing it to be downloaded for permanent, offline access), and requires a disclaimer to be displayed to the customer at the time of purchase.

Apache Parquet

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem inspired by Google Dremel interactive ad-hoc query system for analysis of read-only nested data. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides data compression and encoding schemes with enhanced performance to handle complex data in bulk. == History == The open-source project to build Apache Parquet began as a joint effort between Twitter and Cloudera using the record shredding and assembly algorithm as described in Google's Dremel. Parquet was designed as an improvement on the Trevni columnar storage format created by Doug Cutting, the creator of Hadoop. The name 'parquet' (lit. 'small compartment') refers to a style of decorative flooring and was chosen to "evoke the bottom layer of a database with an interesting layout". The first version, Apache Parquet 1.0, was released in July 2013. Since April 27, 2015, Apache Parquet has been a top-level Apache Software Foundation (ASF)-sponsored project. == Features == Apache Parquet is implemented using the record-shredding and assembly algorithm, which accommodates the complex data structures that can be used to store data. The values in each column are stored in contiguous memory locations, providing the following benefits: Column-wise compression is efficient in storage space Encoding and compression techniques specific to the type of data in each column can be used Queries that fetch specific column values need not read the entire row, thus improving performance Apache Parquet is implemented using the Apache Thrift framework, which increases its flexibility; it can work with a number of programming languages like C++, Java, Python, PHP, etc. As of August 2015, Parquet supports the big-data-processing frameworks including Apache Hive, Apache Drill, Apache Impala, Apache Crunch, Apache Pig, Cascading, Presto and Apache Spark. It is one of the external data formats used by the pandas Python data manipulation and analysis library. == Compression and encoding == In Parquet, compression is performed column by column, which enables different encoding schemes to be used for text and integer data. This strategy also keeps the door open for newer and better encoding schemes to be implemented as they are invented. Parquet supports various compression formats: snappy, gzip, LZO, brotli, zstd, and LZ4. === Dictionary encoding === Parquet has an automatic dictionary encoding enabled dynamically for data with a small number of unique values (i.e. below 105) that enables significant compression and boosts processing speed. === Bit packing === Storage of integers is usually done with dedicated 32 or 64 bits per integer. For small integers, packing multiple integers into the same space makes storage more efficient. === Run-length encoding (RLE) === To optimize storage of multiple occurrences of the same value, run-length encoding is used, which is where a single value is stored once along with the number of occurrences. Parquet implements a hybrid of bit packing and RLE, in which the encoding switches based on which produces the best compression results. This strategy works well for certain types of integer data and combines well with dictionary encoding. == Cloud Storage and Data Lakes == Parquet is widely used as the underlying file format in modern cloud-based data lake architectures. Cloud storage systems such as Amazon S3, Azure Data Lake Storage, and Google Cloud Storage commonly store data in Parquet format due to its efficient columnar representation and retrieval capabilities. Data lakehouse frameworks—including Apache Iceberg, Delta Lake, and Apache Hudi —build an additional metadata layer on top of Parquet files to support features such as schema evolution, time-travel queries, and ACID-compliant transactions. In these architectures, Parquet files serve as the immutable storage layer while the table formats manage data versioning and transactional integrity. == Comparison == Apache Parquet is comparable to RCFile and Optimized Row Columnar (ORC) file formats — all three fall under the category of columnar data storage within the Hadoop ecosystem. They all have better compression and encoding with improved read performance at the cost of slower writes. In addition to these features, Apache Parquet supports limited schema evolution, i.e., the schema can be modified according to the changes in the data. It also provides the ability to add new columns and merge schemas that do not conflict. Apache Arrow is designed as an in-memory complement to on-disk columnar formats like Parquet and ORC. The Arrow and Parquet projects include libraries that allow for reading and writing between the two formats. == Implementations == Known implementations of Parquet include:

SitePal

SitePal is a speaking avatar platform for small and medium-sized businesses developed by Oddcast. SitePal allows users to deploy "virtual employees" on websites that can welcome visitors, guide them around the site and answer questions. The use of SitePal on commercial websites has been controversial because many visitors report finding them annoying. Some research has shown that they can increase sales in comparison to using static photographs. == Development == The technology used was the result of more than 4 years of research at Stanford University. The research was based on a literature review and other previous work in the field of artificial intelligence research. The SitePal AI option uses the AIML programming language, which is partially editable by users. This allows web designers to simulate normal human conversation by using keywords or key phrases that the bot can respond to. == Features == The company provides web designers with options to customize the chosen avatar. A large selection of faces, clothing, hair, backgrounds, voices and other details are available. If a web designer wants to use a particular face, Sitepal can create one from a photo. Thus, a mascot or a known face can be simulated. == Speech == Sitepal avatars talk through text-to-speech (tts) software. A short paragraph can be written (up to 900 characters) and the text-to-speech engine will compile the actual speech, which can be reproduced and edited. The tts engine is not perfect, but it comes close to actual speech and is easy to understand. Tts can be further enhanced by some commands, like /laugh and /loud which make the avatar laugh or talk loud. Even pronunciation is possible. The web designer can record and upload his or her own audio messages. Alternatively Sitepal offers professional voice acting service at extra cost. == User interaction == The company provides 5 options for visitor interaction: No interaction. The avatar simply says a pre-fixed message. FAQ mode. Questions can be configured, which are clickable and the user can hear the answer. Lead mode. The avatar prompts the user to type his email and short message, so it can be sent to the webmaster (usually used on a "contact us" page) Chatbot mode. The avatar greets the user, and he can type his questions and have a conversation with the bot. With predetermined replies, this can work as an FAQ as well. API customization. Experienced programmers can make their avatar interact with their website, making it talk when the user clicks on a link or when other triggers occur. Even dual avatar conversations can be created, like a talk show. == Posting options == The company provides five options for posting the avatar: Embed in webpage (via javascript) Embed in HTML Send by email Publish to eBay Embed in Flash == Criticism == Early reviews, such as one by Troy Dreier published in PC World in 2002 were positive and described SitePal as: "an engagingly simple and personal tool, and the price is reasonable for what it adds to a site". Although Dreier did note that the program had "bugs that suggested it hadn't been tested thoroughly". In more recent years, reaction to SitePal has been much more negative with reviews such as Tom Spring writing in a PC World review citing SitePal ads and described his reaction as "Not so nice". Paul Bissex, writing in E-Scribe News described SitePal as "heinous... and embarrassing if anyone is within earshot...they creep me out" == Research on effectiveness == In one single-website research project Anita Campbell had half the visitors to Small Business Trends see a SitePal and the other half see just a static photograph. Over 11,000 visitors the SitePal avatar improved sign-up for a newsletter 144% over the control condition.