AI For Business Specialization Upenn

AI For Business Specialization Upenn — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Google Cloud Dataflow

    Google Cloud Dataflow

    Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem. Dataflow provides a fully managed service for executing Apache Beam pipelines, offering features like autoscaling, dynamic work rebalancing, and a managed execution environment. Dataflow is suitable for large-scale, continuous data processing jobs, and is one of the major components of Google's big data architecture on the Google Cloud Platform. At its core, Dataflow's architecture is designed to abstract away infrastructure management, allowing developers to focus purely on the logic of their data processing tasks. When a pipeline written using the Apache Beam SDK is submitted, Dataflow translates this high-level definition into an optimized job graph. The service then provisions and manages a fleet of Google Compute Engine workers to execute this graph in a highly parallelized and fault-tolerant manner. This serverless approach, combined with intelligent autoscaling of both the number of workers (horizontal) and the resources per worker (vertical), ensures that jobs have the precise amount of computational power needed at any given time, optimizing both performance and cost. The service's deep integration with the Google Cloud ecosystem makes it a powerful tool for a variety of use cases beyond simple data movement. For real-time analytics, Dataflow can ingest unbounded streams of data from Cloud Pub/Sub, perform complex transformations, and load results into BigQuery for immediate querying. In machine learning workflows, it is commonly used to preprocess and transform massive datasets stored in Cloud Storage, preparing them for training models in Vertex AI. This versatility makes it the central processing engine for modern ETL (Extract, Transform, Load) operations, streaming analytics, and large-scale data preparation within the cloud. == History == Google Cloud Dataflow was announced in June, 2014 and released to the general public as an open beta in April, 2015. In January, 2016 Google donated the underlying SDK, the implementation of a local runner, and a set of IOs (data connectors) to access Google Cloud Platform data services to the Apache Software Foundation. The donated code formed the original basis for Apache Beam. In August 2022, there was an incident where user timers were broken for certain Dataflow streaming pipelines in multiple regions, which was later resolved. Throughout 2023 and 2024, there have been various other updates and incidents affecting Google Cloud Dataflow, as documented in the release notes and service health history. The donation of the Dataflow SDK to the Apache Software Foundation was a pivotal moment, establishing Apache Beam as a unified, open-source programming model for defining both batch and streaming data pipelines. This strategic move decoupled the pipeline definition from the execution engine. As a result, developers could write portable data processing logic that was not locked into Google's ecosystem. A Beam pipeline can be executed on various runners, including Apache Flink, Apache Spark, and, of course, the highly optimized Google Cloud Dataflow service, providing flexibility and future-proofing data processing investments. == Features == Google Cloud Dataflow supports both batch and streaming data processing pipelines. It automatically handles resource provisioning, data sharding, and scaling according to workload, reducing manual configuration needed for large-scale data operations. == Use cases == Dataflow is used for ETL (Extract, Transform, Load) data pipelines, real-time analytics, and event stream processing for companies in industries such as finance, advertising, and IoT.

    Read more →
  • List of cryptography journals

    List of cryptography journals

    List of cryptography journals includes notable peer-reviewed academic journals that focus on cryptography, cryptanalysis, information security, and related areas in computer science and mathematics. == Notable journals == Cryptologia Designs, Codes and Cryptography IEEE Transactions on Information Theory International Journal of Information Security Journal of Cryptology Journal of Computer Security ACM Transactions on Privacy and Security Information Processing Letters Information and Computation

    Read more →
  • Semiotics of social networking

    Semiotics of social networking

    The semiotics of social networking discusses the images, symbols and signs used in systems that allow users to communicate and share experiences with each other. Examples of social networking systems include Facebook, Twitter and Instagram. == Semiotics == Semiotics is a discipline that studies images, symbols, signs and other similarly related objects in an effort to understand their use and meaning. Semiotic structuralism seeks the meaning of these objects within a social context. Post-structuralist theories take tools from structuralist semiotics in combination with social interaction, creating social semiotics. Social semiotics is “a branch of the field of semiotics which investigates human signifying practices in specific social and cultural circumstances and which tries to explain meaning-making as a social practice.” “Social semiotics also examines semiotic practices, specific to a culture and community, for the making of various kinds of texts and meanings in various situational contexts and contexts of culturally meaningful activity”. Social semiotics is concerned with studying human interactions. == Social networking == Social networking is the communication among people within a virtual social space. This medium of communication allows insight into the significance of social semiotics. “Millions of people now interact through blogs, collaborate through wikis, play multiplayer games, publish podcasts and video, build relationships through social network sites and evaluate all the above forms of communication through feedback and ranking mechanisms”. Social semiotics “unlike speech, writing necessitates some sort of technology in the form of person device interaction”. Social semiotics functions through the triad of communication or Peircean semiotics in the form of sign, object, interpretant (Chart 1) and “Human, Machine, Tag (Information)” (Chart 2). In Peircean semiotics (Chart 1), "A sign…[in the form of representamen] is something which stands to somebody for something in some respect or capacity. It addresses somebody, that is, creates in the mind of that person an equivalent sign, or perhaps a more developed sign. That sign which it creates I call the interpretant of the first sign. The sign stands for an object, not in all respects, but in reference to a sort of idea which I have something called the ground of the representamen". This example of the triangle of Human, Machine, Tag is shown when looking at tagging photographs on Facebook (Chart 3). The Human takes the photo on a camera and puts the digital file (information) on the Machine, the Machine is then navigated to Facebook where the file is downloaded. The Human has the Machine Tag the photo with information (e. g., names, places, data) for other Humans to see. This process then can be continued (see Chart 2). “Collaborative tagging has been quickly gaining ground because of its ability to recruit the activity of web users into effectively organizing and sharing large amounts of information”.

    Read more →
  • Omni-Path

    Omni-Path

    Omni-Path Architecture (OPA) is a high-performance communication architecture developed by Intel. It aims for low communication latency, low power consumption and a high throughput. It directly competes with InfiniBand. Intel planned to develop technology based on this architecture for exascale computing. The current owner of Omni-Path is Cornelis Networks. == History == Production of Omni-Path products started in 2015 and delivery of these products started in the first quarter of 2016. In November 2015, adapters based on the 2-port "Wolf River" ASIC were announced, using QSFP28 connectors with channel speeds up to 100 Gbit/s. Simultaneously, switches based on the 48-port "Prairie River" ASIC were announced. First models of that series were available starting in 2015. In April 2016, implementation of the InfiniBand "verbs" interface for the Omni-Path fabric was discussed. In October 2016, IBM, Hewlett Packard Enterprise, Dell, Lenovo, Samsung, Seagate Technology, Micron Technology, Western Digital and SK Hynix announced a joint consortium called Gen-Z to develop an open specification and architecture for non-volatile storage and memory products—including Intel's 3D Xpoint technology—which might in part compete against Omni-Path. Intel offered their Omni-Path products and components via other (hardware) vendors. For example, Dell EMC offered Intel Omni-Path as Dell Networking H-series, following the naming-standard of Dell Networking in 2017. In July 2019, Intel announced it would not continue development of Omni-Path networks and canceled OPA 200 series (200-Gbps variant of Omni-Path). In September 2020, Intel announced that the Omni-Path network products and technology would be spun out into a new venture with Cornelis Networks. Intel would continue to maintain support for legacy Omni-Path products, while Cornelis Networks continues the product line, leveraging existing Intel intellectual property related to Omni-Path architecture. In 2021, Cornelis announced Omni-Path Express, which replaces PSM2-based drivers and middleware, which trace back to PathScale's PSM created in 2003, for the existing Omni-Path hardware, with a native libfabric provider.

    Read more →
  • Smart speaker industry in South Korea

    Smart speaker industry in South Korea

    Smart speakers, or AI speakers, have been developed by multiple domestic electronics and telecommunications firms in South Korea. Since their introduction to the local market in 2016, they have been used by millions of people in the country. == Brands == === Google === In September 2018, Google Home (including the Google Home Mini) launched in South Korea. Running Google Assistant, it featured simultaneous recognition of two languages among a total of seven, including Korean. At launch, it could play music from Bugs!, in addition to YouTube. === Kakao === In November 2017, Kakao launched the Kakao Mini, featuring integrated KakaoTalk functionality. === KT === KT launched the GiGA Genie smart speaker in January 2017, using a Harman Kardon speaker. In November 2017, KT announced GiGA Genie LTE, a portable AI speaker with LTE support. They also released a mini speaker called GiGA Genie Buddy. In 2018, KT created a special version of GiGa Genie with a screen for use in hotels. On 29 April 2019, KT announced the GiGA Genie Table TV, a consumer-oriented smart speaker with a display. It featured paid TV access through Wi-Fi. Based on usage data from the hotel model, KT decided not to add a touchscreen. The Table TV also featured a limited-access "personalized-text-to-speech technology" which could use parents' voice recording inputs to read children books. In February 2022, KT began rolling out Amazon Alexa integration into its speakers for English support. === Naver === In August 2017, Naver announced the Wave smart speaker, operating on Clova. In October 2017, Naver launched the Friends smart speaker, which were designed based on Line characters. ==== LG Uplus ==== In December 2017, LG Uplus launched the Friends+ speaker with Naver, operating on U+ Home AI. === Samsung === In August 2018, Samsung announced the Samsung Galaxy Home in partnership with Spotify. The original size was delayed, while the Galaxy Home Mini appeared briefly as a bonus for Samsung Galaxy S20 preorders in South Korea in February 2020. === SK Telecom === SK Telecom launched the Nugu smart speaker in September 2016, using an Astell & Kern audio system. In August 2017, SKT released a portable speaker named Nugu mini. In July 2018, SKT launched the Nugu Candle, featuring expanded mood lighting. The first-generation Nugu was subsequently discontinued. On 18 April 2019, SKT released the NUGU Nemo AI, which featured a display and JBL stereo speaker. In August 2019, SKT collaborated with SM Entertainment, incorporating functions related to the agency's artists into Nugu. In January 2022, SKT showcased the NUGU Candle SE, introducing Alexa support. == Usage == In 2018, approximately 3 million people in South Korea used smart speakers. According to data from KT in 2018, the most common commands to its speakers were for controlling televisions. Based on a broader survey in 2017, music was selected as the most frequent use case. By 2018, smart speaker companies were partnering with reading and other education services, adding potential use-cases for children. By 2022, smart speakers were being utilized by the South Korean government. SKT, in partnership with 70 regional governments, distributed smart speakers to 12,000 senior citizens living alone. The government paid for monthly subscriptions to help seniors stay mentally engaged. Naver made an agreement with the Seoul Metropolitan Government to provide Clova CareCall, an automated health checkup program to hundreds of senior citizens living alone. KT's AI care service included an emergency dispatch call function and medication notifications. == Criticism == === Communication === In a survey of 300 users in 2017, approximately half reported having some type of communication issue with their smart speakers. === Privacy === South Korean smart speakers sparked privacy concerns when they were found to be collecting and documenting user audio data in 2019. The speaker companies responded that only a minority of data was collected and that it was anonymized. They stated that such recordings were collected for performance improvements.

    Read more →
  • Social media background check

    Social media background check

    A social media background check is an investigative technique that involves scrutinizing the social media profiles and activities of individuals, primarily for pre-employment screening and other official verifications. These checks are performed to review people's online behavioral history on social media websites such as Facebook, Twitter, and LinkedIn. Social media background checks have become a common part of recruitment processes, among other verification procedures. == History == In the early 21st century, with the rapid expansion of social media platforms such as Facebook, Twitter, and LinkedIn, employers began to use these channels to gather additional information about prospective employees. Initially, social media background checks were an informal aspect of recruitment, but they have gradually gained formal recognition as a crucial element in candidate screening. Proponents of social media background checks argue that such reviews provide insight into a candidate's professional interests and networks, though the reliability of such assessments remains contested among researchers. == Rise in society == The practice of social media background checks has seen a significant surge in the last decade. This rise can be attributed to the exponential increase in social media users and the growing awareness among organizations regarding the importance of hiring individuals who align with their values and culture. Various platforms provide services explicitly designed to conduct social media background checks efficiently, simplifying the process for businesses. Companies providing social media background check services, such as Ferretly and Certn, have received venture capital funding, reflecting investor interest in the sector. The incorporation of artificial intelligence into conducting AI-powered social media background checks also illustrates its continued popularity and that businesses are looking to ramp up and even automate their use. High-profile cases in which individuals faced employment or admission consequences for past social media posts have raised awareness of social media background checking practices. For example, director James Gunn faced termination from Marvel Studios in 2018 over past offensive tweets, though he was later rehired. Additionally, multiple college admissions officers have acknowledged reviewing applicants' social media profiles, though such practices vary by institution. == Evolution of ethical considerations == Social media background checks are not without controversy, raising significant ethical considerations that have evolved in recent years. Privacy advocates argue that social media background checks raise concerns about data use and discrimination, particularly given the use of personal information that may not reflect job-relevant behavior. Legal scholars debate whether reviewing publicly posted information constitutes a privacy violation under U.S. law. Researchers and critics note that social media profiles often present curated representations of users' lives and may not reflect workplace behavior or professional competence. Moreover, the accuracy of social media background checks has been called into question, with critics pointing out that these checks may not always yield reliable or comprehensive results. Critics also warn about potential misuse of information obtained from social media, including cyberbullying and harassment. A 2023 study by found that approximately 90% of employers incorporate social media into hiring processes, with over half of those surveyed reporting they had rejected candidates based on social media content. This informal approach operates largely outside federal compliance frameworks. Critics argue that without regulation, candidates lack dispute mechanisms available under regulatory frameworks like the Fair Credit Reporting Act (FCRA), which requires compliance when background checks formally influence employment decisions. In a hiring environment where the practice is already performed often on an individual basis, the introduction of systematic, regulated screening practices that meet federal compliance standards can present a better, fairer alternative for both employers and candidates. == Business considerations == From a business perspective, social media background checks can be a valuable tool in protecting an organization's reputation and maintaining a safe and respectful workplace environment. A well-conducted social media background check can identify potential red flags, helping to prevent instances of workplace harassment or other negative behaviors. However, businesses also face potential legal repercussions if social media background checks are conducted improperly, such as non-compliance with the Fair Credit Reporting Act (FCRA) in the United States. Critics argue that over-reliance on social media data may exclude qualified candidates whose professional competence is not reflected in their online presence. The proliferation of social media screening services has prompted legal and industry experts to emphasize the importance of compliance with the Fair Credit Reporting Act and relevant state privacy laws when conducting such checks.

    Read more →
  • Data set (IBM mainframe)

    Data set (IBM mainframe)

    In the context of IBM mainframe computers in the IBM System/360 line and its successors, a data set (IBM preferred) or dataset is a computer file having a record organization. Use of this term began with, e.g., DOS/360 and OS/360, and is still used by their successors, including the current VSE and z/OS. Documentation for these systems historically preferred this term rather than file. A data set is typically stored on a direct access storage device (DASD) or magnetic tape, however unit record devices, such as punch card readers, card punches, line printers and page printers can provide input/output (I/O) for a data set (file). Data sets are not unstructured streams of bytes, but rather are organized in various logical record and block structures determined by the DSORG (data set organization), RECFM (record format), and other parameters. These parameters are specified at the time of the data set allocation (creation), for example with Job Control Language DD statements. Within a running program they are stored in the Data Control Block (DCB) or Access Control Block (ACB), which are data structures used to access data sets using access methods. Records in a data set may be fixed, variable, or “undefined” length. == Data set organization == For OS/360, the DCB's DSORG parameter specifies how the data set is organized. It may be CQ Queued Telecommunications Access Method (QTAM) in Message Control Program (MCP) CX Communications line group DA Basic Direct Access Method (BDAM) GS Graphics device for Graphics Access Method(GAM) IS Indexed Sequential Access Method (ISAM) MQ QTAM message queue in application PO Partitioned Organization PS Physical Sequential among others. Data sets on tape may only be DSORG=PS. The choice of organization depends on how the data is to be accessed, and in particular, how it is to be updated. Programmers utilize various access methods (such as QSAM or VSAM) in programs for reading and writing data sets. Access method depends on the given data set organization. == Record format (RECFM) == Regardless of organization, the physical structure of each record is essentially the same, and is uniform throughout the data set. This is specified in the DCB RECFM parameter. RECFM=F means that the records are of fixed length, specified via the LRECL parameter. RECFM=V specifies a variable-length record. V records when stored on media are prefixed by a Record Descriptor Word (RDW) containing the integer length of the record in bytes and flag bits. With RECFM=FB and RECFM=VB, multiple logical records are grouped together into a single physical block on tape or DASD. FB and VB are fixed-blocked, and variable-blocked, respectively. RECFM=U (undefined) is also variable length, but the length of the record is determined by the length of the block rather than by a control field. The BLKSIZE parameter specifies the maximum length of the block. RECFM=FBS could be also specified, meaning fixed-blocked standard, meaning all the blocks except the last one were required to be in full BLKSIZE length. RECFM=VBS, or variable-blocked spanned, means a logical record could be spanned across two or more blocks, with flags in the RDW indicating whether a record segment is continued into the next block and/or was continued from the previous one. This mechanism eliminates the need for using any "delimiter" byte value to separate records. Thus data can be of any type, including binary integers, floating-point, or characters, without introducing a false end-of-record condition. The data set is an abstraction of a collection of records, in contrast to files as unstructured streams of bytes. == Partitioned data set == A partitioned data set (PDS) is a data set containing multiple members, each of which holds a separate sub-data set, similar to a directory in other types of file systems. This type of data set is often used to hold load modules (old format bound executable programs), source program libraries (especially Assembler macro definitions), ISPF screen definitions, and Job Control Language. A PDS may be compared to a Zip file or COM Structured Storage. A Partitioned Data Set can only be allocated on a single volume and have a maximum size of 65,535 tracks. Besides members, a PDS contains also a directory. Each member can be accessed indirectly via the directory structure. Once a member is located, the data stored in that member are handled in the same manner as a PS (sequential) data set. Whenever a member is deleted, the space it occupied is unusable for storing other data. Likewise, if a member is re-written, it is stored in a new spot at the back of the PDS and leaves wasted “dead” space in the middle. The only way to recover “dead” space is to perform file compression. Compression, which is done using the IEBCOPY utility, moves all members to the front of the data space and leaves free usable space at the back. (Note that in modern parlance, this kind of operation might be called defragmentation or garbage collection; data compression nowadays refers to a different, more complicated concept.) PDS files can only reside on DASD, not on magnetic tape, in order to use the directory structure to access individual members. Partitioned data sets are most often used for storing multiple job control language files, utility control statements, and executable modules. An improvement of this scheme is a Partitioned Data Set Extended (PDSE or PDS/E, sometimes just libraries) introduced with DFSMSdfp for MVS/XA and MVS/ESA systems. A PDS/E library can store program objects or other types of members, but not both. BPAM cannot process a PDS/E containing program objects. PDS/E structure is similar to PDS and is used to store the same types of data. However, PDS/E files have a better directory structure which does not require pre-allocation of directory blocks when the PDS/E is defined (and therefore does not run out of directory blocks if not enough were specified). Also, PDS/E automatically stores members in such a way that compression operation is not needed to reclaim "dead" space. PDS/E files can only reside on DASD in order to use the directory structure to access individual members. == Generation Data Group == A Generation Data Group (GDG) is a group of non-VSAM data sets that are successive generations of historically-related data stored on an IBM mainframe (running OS/360 and its successors or DOS/360 and its successors). A GDG is usually cataloged. An individual member of the GDG collection is called a "Generation Data Set." The latter may be identified by an absolute number, ACCTG.OURGDG(1234), or a relative number: (-1) for the previous generation, (0) for the current one, and (+1) the next generation. A GDG specifies how many generations of a data set are to be kept and at what age a generation will be deleted. Whenever a new generation is created, the system checks whether one or more obsolete generations are to be deleted. The purpose of GDGs is to automate archival, using the command language JCL, the data set name given is generic. When DSN appears, the GDG data set appears along with the history number, where (0) is the most recent version (-1), (-2), ... are previous generations (+1) a new generation (see DD) Another use of GDGs is to be able to address all generations simultaneously within a JCL script without having to know the number of currently available generations. To do this, you have to omit the parentheses and the generation number in the JCL when specifying the dataset. === GDG JCL & features === Generation Data Groups are defined using either the BLDG statement of the IEHPROGM utility or the DEFINE GENERATIONGROUP statement of the newer IDCAMS utility, which allows setting various parameters. LIMIT(10) would limit the number of generations limit to 10. SCRATCH FOR (91) would retain each member, up to the limited#generations, at least 91 days. IDCAMS can also delete (and optionally uncatalog) a GDG. ==== Example ==== Creation of a standard GDG for five safety scopes, each at least 35 days old: Delete a standard GDG:

    Read more →
  • Group key

    Group key

    In cryptography, a group key is a cryptographic key that is shared between a group of users. Typically, group keys are distributed by sending them to individual users, either physically, or encrypted individually for each user using either that user's pre-distributed private key. A common use of group keys is to allow a group of users to decrypt a broadcast message that is intended for that entire group of users, and no one else. For example, in the Second World War, group keys (known as "iodoforms", a term invented by a classically educated non-chemist, and nothing to do with the chemical of the same name) were sent to groups of agents by the Special Operations Executive. These group keys allowed all the agents in a particular group to receive a single coded message. In present-day applications, group keys are commonly used in conditional access systems, where the key is the common key used to decrypt the broadcast signal, and the group in question is the group of all paying subscribers. In this case, the group key is typically distributed to the subscribers' receivers using a combination of a physically distributed secure cryptoprocessor in the form of a smartcard and encrypted over-the-air messages.

    Read more →
  • Digital image

    Digital image

    A digital image is an image composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or gray level that is an output from its two-dimensional functions fed as input by its spatial coordinates denoted with x, y on the x-axis and y-axis, respectively. An image can be vector or raster type. By itself, the term "digital image" usually refers to raster images or bitmapped images (as opposed to vector images). == Raster == Raster images have a finite set of digital values, called picture elements or pixels. The digital image contains a fixed number of rows and columns of pixels. Pixels are the smallest individual element in an image, holding quantized values that represent the brightness of a given color at any specific point. Typically, the pixels are stored in computer memory as a raster image or raster map, a two-dimensional array of small integers. These values are often transmitted or stored in a compressed form. Raster images can be created by a variety of input devices and techniques, such as digital cameras, scanners, coordinate-measuring machines, seismographic profiling, airborne radar, and more. They can also be synthesized from arbitrary non-image data, such as mathematical functions or three-dimensional geometric models; the latter being a major sub-area of computer graphics. The field of digital image processing is the study of algorithms for their transformation. === Raster file formats === Most users come into contact with raster images through digital cameras, which use any of several image file formats. Some digital cameras give access to almost all the data captured by the camera, using a raw image format. The Universal Photographic Imaging Guidelines (UPDIG) suggests these formats be used when possible since raw files produce the best quality images. These file formats allow the photographer and the processing agent the greatest level of control and accuracy for output. Their use is inhibited by the prevalence of proprietary information (trade secrets) for some camera makers, but there have been initiatives such as OpenRAW to influence manufacturers to release these records publicly. An alternative may be Digital Negative (DNG), a proprietary Adobe product described as "the public, archival format for digital camera raw data". Although this format is not yet universally accepted, support for the product is growing, and increasingly professional archivists and conservationists, working for respectable organizations, variously suggest or recommend DNG for archival purposes. == Vector == Vector images resulted from mathematical geometry (vector). In mathematical terms, a vector consists of both a magnitude, or length, and a direction. Often, both raster and vector elements will be combined in one image; for example, in the case of a billboard with text (vector) and photographs (raster). Example of vector file types are EPS, PDF, and AI. == Image viewing == Image viewer software displayed on images. Web browsers can display standard internet images formats including JPEG, GIF and PNG. Some can show SVG format which is a standard W3C format. In the past, when the Internet was still slow, it was common to provide "preview" images that would load and appear on the website before being replaced by the main image (to give a preliminary impression). Now Internet is fast enough and this preview image is seldom used. Some scientific images can be very large (for instance, the 46 gigapixel size image of the Milky Way, about 194 GB in size). Such images are difficult to download and are usually browsed online through more complex web interfaces. Some viewers offer a slideshow utility to display a sequence of images. == History == Early digital fax machines such as the Bartlane cable picture transmission system preceded digital cameras and computers by decades. The first picture to be scanned, stored, and recreated in digital pixels was displayed on the Standards Eastern Automatic Computer (SEAC) at NIST. The advancement of digital imagery continued in the early 1960s, alongside development of the space program and in medical research. Projects at the Jet Propulsion Laboratory, MIT, Bell Labs and the University of Maryland, among others, used digital images to advance satellite imagery, wirephoto standards conversion, medical imaging, videophone technology, character recognition, and photo enhancement. Rapid advances in digital imaging began with the introduction of MOS integrated circuits in the 1960s and microprocessors in the early 1970s, alongside progress in related computer memory storage, display technologies, and data compression algorithms. The invention of computerized axial tomography (CAT scanning), using x-rays to produce a digital image of a "slice" through a three-dimensional object, was of great importance to medical diagnostics. As well as origination of digital images, digitization of analog images allowed the enhancement and restoration of archaeological artifacts and began to be used in fields as diverse as nuclear medicine, astronomy, law enforcement, defence and industry. Advances in microprocessor technology paved the way for the development and marketing of charge-coupled devices (CCDs) for use in a wide range of image capture devices and gradually displaced the use of analog film and tape in photography and videography towards the end of the 20th century. The computing power necessary to process digital image capture also allowed computer-generated digital images to achieve a level of refinement close to photorealism. === Digital image sensors === The first semiconductor image sensor was the CCD, developed by Willard S. Boyle and George E. Smith at Bell Labs in 1969. While researching MOS technology, they realized that an electric charge was the analogy of the magnetic bubble and that it could be stored on a tiny MOS capacitor. As it was fairly straightforward to fabricate a series of MOS capacitors in a row, they connected a suitable voltage to them so that the charge could be stepped along from one to the next. The CCD is a semiconductor circuit that was later used in the first digital video cameras for television broadcasting. Early CCD sensors suffered from shutter lag. This was largely resolved with the invention of the pinned photodiode (PPD). It was invented by Nobukazu Teranishi, Hiromitsu Shiraki and Yasuo Ishihara at NEC in 1980. It was a photodetector structure with low lag, low noise, high quantum efficiency and low dark current. In 1987, the PPD began to be incorporated into most CCD devices, becoming a fixture in consumer electronic video cameras and then digital still cameras. Since then, the PPD has been used in nearly all CCD sensors and then CMOS sensors. The NMOS active-pixel sensor (APS) was invented by Olympus in Japan during the mid-1980s. This was enabled by advances in MOS semiconductor device fabrication, with MOSFET scaling reaching smaller micron and then sub-micron levels. The NMOS APS was fabricated by Tsutomu Nakamura's team at Olympus in 1985. The CMOS active-pixel sensor (CMOS sensor) was later developed by Eric Fossum's team at the NASA Jet Propulsion Laboratory in 1993. By 2007, sales of CMOS sensors had surpassed CCD sensors. === Digital image compression === An important development in digital image compression technology was the discrete cosine transform (DCT), a lossy compression technique first proposed by Nasir Ahmed in 1972. DCT compression is used in JPEG, which was introduced by the Joint Photographic Experts Group in 1992. JPEG compresses images down to much smaller file sizes, and has become the most widely used image file format on the Internet. == Mosaic == In digital imaging, a mosaic is a combination of non-overlapping images, arranged in some tessellation. Gigapixel images are an example of such digital image mosaics. Satellite imagery are often mosaicked to cover Earth regions. Interactive viewing is provided by virtual-reality photography.

    Read more →
  • Data security

    Data security

    Data security or data protection is the process of securing digital information to protect it from online threats. Data security or protection means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach. Data security protects computer hardware, software, storage devices, and the data of user devices. Data security also protects the data of organizations, companies and administrative controls. Data security guarantees the protection of individual data, such as identity documents and bank data, and protects against unauthorized access, theft and loss of individual data. Data security also protects data breaches that occurs in companies and industries. Good security measures in industries reduce the probability of data breaches, and employees can rely on the company with their data and private information to be kept secured while companies can continue to maintain a stable reputation. The CIA Triad (Confidentiality, Integrity, and Availability) is what is used to practice what an information security is required to follow. Confidentiality, protects information from being accessed by unauthorized persons. Integrity, makes sure data is trustworthy; and Availability, meaning that data can be accessed by approved users when it is needed; are three goals for data security. Non-repudiation in data security definition, is a device/service that shows where the data originated from and the proof of integrity. == Technologies == === Disk encryption === Disk encryption refers to encryption technology that encrypts data on a hard disk drive. It takes data from a storage device and coverts it into an unreadable format. Disk encryption typically takes form in either software (see disk encryption software) or hardware (see disk encryption hardware) which can be used together. Disk encryption is often referred to as on-the-fly encryption (OTFE) or transparent encryption. Full disk encryption encrypts each individual sector of a disk volume. Files and user data are encrypted to hinder unauthorized users from accessing without a decryption key. A diversifier permits a plaintext of a specific disk sector to be encrypted into different ciphertexts, which does not require additional storage, such as an initialization vector (IV) or message authentication code (MAC). === Software versus hardware-based mechanisms for protecting data === Software-based security solutions encrypt the data to protect it from theft. However, a malicious program or a hacker could corrupt the data to make it unrecoverable, making the system unusable. Hardware-based security solutions prevent read and write access to data, which provides very strong protection against tampering and unauthorized access. Hardware-based security or assisted computer security offers an alternative to software-only computer security. Security tokens such as those using PKCS#11 or a mobile phone may be more secure due to the physical access required in order to be compromised. Access is enabled only when the token is connected and the correct PIN is entered (see two-factor authentication). However, dongles can be used by anyone who can gain physical access to it. Newer technologies in hardware-based security solve this problem by offering full proof of security for data. Working off hardware-based security: A hardware device allows a user to log in, log out and set different levels through manual actions. Many devices use biometric technology to prevent malicious users from logging in, logging out, and changing privilege levels. The current state of a user of the device is read by controllers in peripheral devices such as hard disks. Illegal access by a malicious user or a malicious program is interrupted based on the current state of a user by hard disk and DVD controllers making illegal access to data impossible. Hardware-based access control is more secure than the protection provided by the operating systems as operating systems are vulnerable to malicious attacks by viruses and hackers. The data on hard disks can be corrupted after malicious access is obtained. With hardware-based protection, the software cannot manipulate the user privilege levels. A hacker or a malicious program cannot gain access to secure data protected by hardware or perform unauthorized privileged operations. This assumption is broken only if the hardware itself is malicious or contains a backdoor. The hardware protects the operating system image and file system privileges from being tampered with. Therefore, a completely secure system can be created using a combination of hardware-based security and secure system administration policies. === Backups === Backup is the process of reproducing copies of essential data and storing in a separate, secured place. It is used to ensure data that is lost can be recovered from another source. Backups contains a minimum of one copy of the data that requires preservation. It is considered essential to keep a backup of any data in most industries and the process is recommended for any files of importance to a user. There are 3 types of backups; full backups, incremental backups, and differential backups. Full backups secure all data from a production system, such as a server, database, or other connected data source. It is impossible to lose all data in a full backup if a breach or corruption were to occur. Full backups require a significantly large amount of time to back up and may be time-consuming taking hours to days to complete. Incremental backups only secures changed data since last backup. While all backups are done in full backups, incremental backups only save data that is recently or frequently changed. Incremental backups require lower storage costs making it a prominent solution for growing datasets. === Data Privacy === Data privacy (or information privacy) is the right for individual's data to be secured to obstruct the use of unauthorized access. It gives individuals control over their data and how it can be shared to third parties. The U.S Privacy Protection Law (see Privacy laws of the United States) requires organizations to inform individuals of how their data is collected and when a data breach occurs. By implementing an encryption, it ensures that private data is unreadable to cybercriminals. === Data masking === Data masking of structured data is the process of obscuring (masking) specific data within a database table or cell to ensure that data security is maintained and sensitive information is not exposed to unauthorized personnel. This may include masking the data from users (for example so banking customer representatives can only see the last four digits of a customer's national identity number), developers (who need real production data to test new software releases but should not be able to see sensitive financial data), outsourcing vendors, etc. Data masking is a form of encryption, as it obscures data by modifying particular letters and numbers to keep data concealed and protected from potential hackers. The individual that has access to the code that decrypts the replaced characters are the only ones that can uncover the data. === Data erasure === Data erasure (or data deletion, data destruction) is a method of software-based overwriting that permanently clears all electronic data residing on a hard drive or other digital media to ensure that no sensitive data is lost when an asset is retired or reused. Article 17: Right to be Forgotten states that users have the right to permanently remove all of their private information from their old devices/services to give people more control over their data. Users are able to switch between devices efficiently. == Threats == === Malware === Malware (or malicious software) is designed to destroy, corrupt or gain unauthorized access to a computer for the purpose of stealing, or destroying data. Hackers who use malware typically utilize many types of malware, which includes computer virus, computer worms, ransomware, spyware and Trojan horse to create a vast system of disruption and cause easy data theft. One of the victims of the vast system of disruption includes healthcare workers, who are targeted by compromised systems by infections and then having their data attacked. === Phishing === Phishing is a type of scam that allows hackers to hoax people using psychological and social engineering (using human emotions such as their trust and fear) tactics into giving personal data through emails and messages, and install computer viruses if the individual were to click on a malicious link unknowingly. Attackers are able to create websites that are very similar to original websites, which makes it difficult to detect a fake website, causing individuals to fall for giving in information. Phishing attackers use human emotion to exploit them, such as making them feel fear, urgency, sympathy with the message

    Read more →
  • Client-side encryption

    Client-side encryption

    Client-side encryption is the cryptographic technique of encrypting data on the sender's side, before it is transmitted to a server such as a cloud storage service. Client-side encryption features an encryption key that is not available to the service provider, making it difficult or impossible for service providers to decrypt hosted data. Client-side encryption allows for the creation of applications whose providers cannot access the data its users have stored, thus offering a high level of privacy. Applications utilizing client-side encryption are sometimes marketed under the misleading or incorrect term "zero-knowledge", but this is a misnomer, as the term zero-knowledge describes something entirely different in the context of cryptography. == Details == Client-side encryption seeks to eliminate the potential for data to be viewed by service providers (or third parties that compel service providers to deliver access to data), client-side encryption ensures that data and files that are stored in the cloud can only be viewed on the client-side of the exchange. This prevents data loss and the unauthorized disclosure of private or personal files, providing increased peace of mind for its users. Current recommendations by industry professionals as well as academic scholars offer great vocal support for developers to include client-side encryption to protect the confidentiality and integrity of information. === Examples of services that use client-side encryption by default === Tresorit MEGA Cryptee Cryptomator === Examples of services that optionally support client-side encryption === Apple iCloud offers optional client-side encryption when "Advanced Data Protection for iCloud" is enabled. Google Drive, Google Docs, Google Meet, Google Calendar, and Gmail — However, as of Jul 2024, optional client-side encryption features are only available to paid users. === Examples of services that do not support client-side encryption === Dropbox === Examples of client-side encrypted services that no longer exist === SpiderOak Backup

    Read more →
  • Interplanetary Internet

    Interplanetary Internet

    The interplanetary Internet is a conceived computer network in space, consisting of a set of network nodes that can communicate with each other. These nodes are the planet's orbiters and landers, and the Earth ground stations. For example, the orbiters collect the scientific data from the Curiosity rover on Mars through near-Mars communication links, transmit the data to Earth through direct links from the Mars orbiters to the Earth ground stations via the NASA Deep Space Network, and finally the data routed through Earth's internal internet. Interplanetary communication is greatly delayed by interplanetary distances, as data transmission can only go as fast as the speed of light, so a new set of protocols and technologies that are tolerant to large delays and errors are required. The interplanetary Internet has been envisioned as a store and forward network of internets that is often disconnected, has a wireless backbone fraught with error-prone links and delays ranging from tens of minutes to even hours, even when there is a connection. As of 2024 agencies and companies working towards bringing the network to fruition include NASA, ESA, SpaceX and Blue Origin. == Challenges and reasons == In the core implementation of Interplanetary Internet, satellites orbiting a planet communicate to other planet's satellites. Simultaneously, these planets revolve around the Sun with long distances, and thus many challenges face the communications. The reasons and the resultant challenges are: The motion and long distances between planets: The interplanetary communication is greatly delayed due to the interplanetary distances and the motion of the planets. The delay is variable and long, ranging from a couple of minutes (Earth-to-Mars), to a couple of hours (Pluto-to-Earth), depending on their relative positions. The interplanetary communication also suspends due to the solar conjunction, when the sun's radiation hinders the direct communication between the planets. As such, the communication characterizes lossy links and intermittent link connectivity. Low embeddable payload: Satellites can only carry a small payload, which poses challenges to the power, mass, size, and cost for communication hardware design. An asymmetric bandwidth would be the result of this limitation. This asymmetry reaches ratios up to 1000:1 as downlink:uplink bandwidth portion. Absence of fixed infrastructure: The graph of participating nodes in a specific planet-to-planet communication keeps changing over time, due to the constant motion. The routes of the planet-to-planet communication are planned and scheduled rather than being opportunistic. The Interplanetary Internet design must address these challenges to operate successfully and achieve good communication with other planets. It also must use the few available resources efficiently in the system. == Development == Space communication technology has steadily evolved from expensive, one-of-a-kind point-to-point architectures, to the re-use of technology on successive missions, to the development of standard protocols agreed upon by space agencies of many countries. This last phase has gone on since 1982 through the efforts of the Consultative Committee for Space Data Systems (CCSDS), a body composed of the major space agencies of the world. It has 11 member agencies, 32 observer agencies, and over 119 industrial associates. The evolution of space data system standards has gone on in parallel with the evolution of the Internet, with conceptual cross-pollination where fruitful, but largely as a separate evolution. Since the late 1990s, familiar Internet protocols and CCSDS space link protocols have integrated and converged in several ways; for example, the successful FTP file transfer to Earth-orbiting STRV 1B on January 2, 1996, which ran FTP over the CCSDS IPv4-like Space Communications Protocol Specifications (SCPS) protocols. Internet Protocol use without CCSDS has taken place on spacecraft, e.g., demonstrations on the UoSAT-12 satellite, and operationally on the Disaster Monitoring Constellation. Having reached the era where networking and IP on board spacecraft have been shown to be feasible and reliable, a forward-looking study of the bigger picture was the next phase. The Interplanetary Internet study at NASA's Jet Propulsion Laboratory (JPL) was started by a team of scientists at JPL led by internet pioneer Vinton Cerf and the late Adrian Hooke. Cerf was appointed as a distinguished visiting scientist at JPL in 1998, while Hooke was one of the founders and directors of CCSDS. While IP-like SCPS protocols are feasible for short hops, such as ground station to orbiter, rover to lander, lander to orbiter, probe to flyby, and so on, delay-tolerant networking is needed to get information from one region of the Solar System to another. It becomes apparent that the concept of a region is a natural architectural factoring of the Interplanetary Internet. A region is an area where the characteristics of communication are the same. Region characteristics include communications, security, the maintenance of resources, perhaps ownership, and other factors. The Interplanetary Internet is a "network of regional internets". What is needed then, is a standard way to achieve end-to-end communication through multiple regions in a disconnected, variable-delay environment using a generalized suite of protocols. Examples of regions might include the terrestrial Internet as a region, a region on the surface of the Moon or Mars, or a ground-to-orbit region. The recognition of this requirement led to the concept of a "bundle" as a high-level way to address the generalized Store-and-Forward problem. Bundles are an area of new protocol development in the upper layers of the OSI model, above the Transport Layer with the goal of addressing the issue of bundling store-and-forward information so that it can reliably traverse radically dissimilar environments constituting a "network of regional internets". Delay-tolerant networking (DTN) was designed to enable standardized communications over long distances and through time delays. At its core is the Bundle Protocol (BP), which is similar to the Internet Protocol, or IP, that serves as the heart of the Internet here on Earth. The big difference between the regular Internet Protocol (IP) and the Bundle Protocol is that IP assumes a seamless end-to-end data path, while BP is built to account for errors and disconnections — glitches that commonly plague deep-space communications. Bundle Service Layering, implemented as the Bundling protocol suite for delay-tolerant networking, will provide general-purpose delay-tolerant protocol services in support of a range of applications: custody transfer, segmentation and reassembly, end-to-end reliability, end-to-end security, and end-to-end routing among them. The Bundle Protocol was first tested in space on the UK-DMC satellite in 2008. An example of one of these end-to-end applications flown on a space mission is the CCSDS File Delivery Protocol (CFDP), used on the Deep Impact comet mission. CFDP is an international standard for automatic, reliable file transfer in both directions. CFDP should not be confused with Coherent File Distribution Protocol, which has the same acronym and is an IETF-documented experimental protocol for rapidly deploying files to multiple targets in a highly networked environment. In addition to reliably copying a file from one entity (such as a spacecraft or ground station) to another entity, CFDP has the capability to reliably transmit arbitrarily small messages defined by the user, in the metadata accompanying the file, and to reliably transmit commands relating to file system management that are to be executed automatically on the remote end-point entity (such as a spacecraft) upon successful reception of a file. == Protocol == The Consultative Committee for Space Data Systems (CCSDS) packet telemetry standard defines the protocol used for the transmission of spacecraft instrument data over the deep-space channel. Under this standard, an image or other data sent from a spacecraft instrument is transmitted using one or more packets. === CCSDS packet definition === A packet is a block of data with length that can vary between successive packets, ranging from 7 to 65,542 bytes, including the packet header. Packetized data is transmitted via frames, which are fixed-length data blocks. The size of a frame, including frame header and control information, can range up to 2048 bytes. Packet sizes are fixed during the development phase. Because packet lengths are variable but frame lengths are fixed, packet boundaries usually do not coincide with frame boundaries. === Telecom processing notes === Data in a frame is typically protected from channel errors by error-correcting codes. Even when the channel errors exceed the correction capability of the error-correcting code, the presence of errors is nearly always detected by the e

    Read more →
  • BeReal

    BeReal

    BeReal (stylized on the app logo as BeReal.) is a French social-networking app released in 2020, developed by Alexis Barreyat and Kévin Perreau. Currently, it is owned by Voodoo. Its main feature is a daily notification that encourages users to share photos of themselves in their day-to-day life, on any randomly selected two-minute window every day. Critics noted its emphasis on authenticity, which some felt crossed the line into the mundane. The primary reference of its name relates to its focus on users uploading unpolished photos, with it being a pun of the term B-reel. According to the app's description on Apple's App Store, BeReal encourages its users to "show their friends who they really are, for once," by removing filters and opportunities to stage or edit photos. After a couple of years of relative obscurity, it rapidly gained popularity in early and mid-2022 growing from 21.6 million to 73.5 million users between July and August, before experiencing a decrease in use in 2023 and continuing to decline to 23 million users at the beginning of 2024. == History == The app was developed by Alexis Barreyat, a former employee at GoPro, and Kévin Perreau, a graduate from 42 in Paris. Initially released in 2020, it first gained widespread popularity in early 2022. It first spread widely on college campuses, partially due to a paid ambassador program. In late August 2022, the application had over 10 million active daily users and 21.6 million active monthly users. As of February 2023, the app has grown to 13 million active daily users and 47.8 million active monthly users. In June 2021, BeReal received a $30 million funding round led by Andreessen Horowitz and Accel. In May 2022, BeReal secured $85 million in a funding round led by Yuri Milner's DST Global, increasing its valuation to about $600 million. On July 25, 2022, BeReal topped Apple's free app list in the iOS App Store, and remained until September 2022. BeReal also received Apple's iPhone App of the Year in 2022. By late spring 2023, the app's momentum was waning, as daily users dropped to about 6 million, from 15 million in October 2022. In August 2024, there was a resurgence after a campaign at the Paris Olympics 2024, with the app reportedly gaining 1000 users. In June 2024, BeReal was acquired by the French company Voodoo for a reported €500 million. Alexis Barreyat is set to step down after a transition period. == Features == Once per day, BeReal notifies all users that a two-minute window to post is open. It asks users to create a post (known eponymously as a "BeReal") which, using mandatory simultaneous photos and now short videos from both the front and back cameras, provides a visual depiction of what they are doing at that moment, with an option to caption their post. The given window varies from day to day, and is not known to users before the notification is received. Once the daily notification is sent, users lose the ability to see others' BeReals from the previous day. Furthermore, users cannot see any of the current day's BeReals until they upload their own. On-time BeReals show the time it was uploaded, meanwhile, late BeReals uploaded after the two-minute window shows how late the BeReal was taken, but the user has to long-press the BeReal to reveal the time it was uploaded. Other users can also see how many attempts the poster took to take the BeReal, as well as their location when the BeReal was taken. Users only get one chance to delete their BeReal and post another one, and they used to not be able to post more than one at any time. However, in 2023, a feature was added that allowed users to post up to two extra BeReals on days when they posted their first BeReal within the 2-minute window. In July 2024, the number of bonus BeReals was increased to 5. [1] BeReal also features a "Discovery" section, wherein users are given the option to share to a much wider, public audience. This feature, however, is limited, as users are not able to interact with the posts through commenting—unlike the "My Friends" feature. In August 2023, in an attempt to make BeReal more social, another feature was added so that users are now able to see their friends of friends' BeReal. The app reportedly uses HiveAI to automate its image moderation process. However, there is also a report function that allows users to report a photo or another user if they are posting inappropriate content. === Comparison to other platforms === Because of its daily cycle of engagement, it has been compared to Wordle, which gained popularity earlier in 2022. It also supports a platform similar to Snapchat with a theme of impermanence and brevity. BeReal has been described as designed to compete with Instagram while simultaneously de-emphasising social media addiction and overuse. The app does not allow any photo filters or other editing, and has no follower counts. Marketing material from the company said that the app "can be addictive" and that "BeReal won't make you famous." Jacob Arnott, managing director of social agency We the People, describes BeReal as "an anti-Instagram" due to its raw and unedited nature. The app's foundation on friends rather than followers resembles Facebook's platform of adding friends, which comprise the content of a user's feed. This also resembles Instagram's "close friends" story feature. Further, rather than "liking" posts, BeReal uses "RealMojis" which involves taking a photo to interact with other posts. With the popularity of BeReal, other providers have launched similar features. In July 2022, Instagram launched a "Dual Camera" feature similar to BeReal, and in August 2022 it began testing a feature called "IG Candid Challenges", where users are prompted to post once a day within two minutes. As of September 2022, TikTok has also launched a feature called TikTok Now, following the same concept. In December 2022, similar to Spotify's "Wrapped," BeReal launched a feature involving a video of a compilation of users' BeReal posts of 2022. == User characteristics == BeReal is considered to be targeted towards Generation Z users, and attempts to minimise "social media fatigue", a feeling of numbness and disconnection from reality caused by constant interaction with an idealised version of others. This is a "core generational value" that this demographic holds compared to Millennials. Further, BeReal's users have been particularly strong across universities and university-aged students, and the majority of users are in the United States, the United Kingdom, and Germany. In 2022, the majority of users were female, with 43.2% of users falling within the age range of 16 to 25 and 55.1% of users being 26 to 44 years old. BeReal, the platform encourages users to share their real time moments by sending a daily notification that gives a least two minutes to post a unedited photo using bot the front and back camera, although users can post later and retake photos from when the notification happens, this action are still visible to friends, reinforcing transparency and genuine in the moment sharing. == Reception == Jason Koebler, a writer for Vice, wrote that in contrast to Instagram, which presents an unattainable view of people's lives, BeReal instead "makes everyone look extremely boring". Niklas Myhr, a professor of social media at Chapman University, argued that depth of engagement may determine whether the app is a passing trend or has "staying power". Kelsey Weekman, a reporter for BuzzFeed News, noted that the app's unwillingness to "glamorise the banality of life" made it feel "humbling" in its emphasis on authenticity. Niloufar Haidari for The Guardian comments similarly that where the app succeeds in being "drab" in perhaps a positive way, it fails in potentially "un-inspiring" users. Likewise, Dr. Brad Ridout, a behavioral psychologist at the University of Sydney, emphasizes that the "boring" experience is what the creators are targeting for the app and, in response to Instagram's platform of flawlessness, that "perfection is the enemy of happiness". === Criticisms === Some people regularly post after the two-minute notification expires, leading to some criticism of the app, as the ability to post late undermines its aims of authenticity. In addition, BeReal's daily two-minute window has been argued to contribute to social media fatigue and a need for self-exposure, as well as constant access to phones.

    Read more →
  • Honey encryption

    Honey encryption

    Honey encryption is a type of data encryption that "produces a ciphertext, which, when decrypted with an incorrect key as guessed by the attacker, presents a plausible-looking yet incorrect plaintext." == Creators == Ari Juels and Thomas Ristenpart of the University of Wisconsin, the developers of the encryption system, presented a paper on honey encryption at the 2014 Eurocrypt cryptography conference. == Method of protection == A brute-force attack involves repeated decryption with random keys; this is equivalent to picking random plaintexts from the space of all possible plaintexts with a uniform distribution. This is effective because even though the attacker is equally likely to see any given plaintext, most plaintexts are extremely unlikely to be legitimate i.e. the distribution of legitimate plaintexts is non-uniform. Honey encryption defeats such attacks by first transforming the plaintext into a space such that the distribution of legitimate plaintexts is uniform. Thus an attacker guessing keys will see legitimate-looking plaintexts frequently and random-looking plaintexts infrequently. This makes it difficult to determine when the correct key has been guessed. In effect, honey encryption "[serves] up fake data in response to every incorrect guess of the password or encryption key." The security of honey encryption relies on the fact that the probability of an attacker judging a plaintext to be legitimate can be calculated (by the encrypting party) at the time of encryption. This makes honey encryption difficult to apply in certain applications e.g. where the space of plaintexts is very large or the distribution of plaintexts is unknown. It also means that honey encryption can be vulnerable to brute-force attacks if this probability is miscalculated. For example, it is vulnerable to known-plaintext attacks: if the attacker has a crib that a plaintext must match to be legitimate, they will be able to brute-force even Honey Encrypted data if the encryption did not take the crib into account. == Example == An encrypted credit card number is susceptible to brute-force attacks because not every string of digits is equally likely. The number of digits can range from 13 to 19, though 16 is the most common. Additionally, it must have a valid IIN and the last digit must match the checksum. An attacker can also take into account the popularity of various services: an IIN from MasterCard is probably more likely than an IIN from Diners Club Carte Blanche. Honey encryption can protect against these attacks by first mapping credit card numbers to a larger space where they match their likelihood of legitimacy. Numbers with invalid IINs and checksums are not mapped at all (i.e. have probability 0 of legitimacy). Numbers from large brands like MasterCard and Visa map to large regions of this space, while less popular brands map to smaller regions, etc. An attacker brute-forcing such an encryption scheme would only see legitimate-looking credit card numbers when they brute-force, and the numbers would appear with the frequency the attacker would expect from the real world. == Application == Juels and Ristenpart aim to use honey encryption to protect data stored on password manager services. Juels stated that "password managers are a tasty target for criminals," and worries that "if criminals get a hold of a large collection of encrypted password vaults they could probably unlock many of them without too much trouble." Hristo Bojinov, CEO and founder of Anfacto, noted that "Honey Encryption could help reduce their vulnerability. But he notes that not every type of data will be easy to protect this way. … Not all authentication or encryption system yield themselves to being honeyed."

    Read more →
  • Information leakage

    Information leakage

    Information leakage happens whenever a system that is designed to be closed to an eavesdropper reveals some information to unauthorized parties nonetheless. In other words: Information leakage occurs when secret information correlates with, or can be correlated with, observable information. For example, when designing an encrypted instant messaging network, a network engineer without the capacity to crack encryption codes could see when messages are transmitted, even if he could not read them. == Risk vectors == A modern example of information leakage is the leakage of secret information via data compression, by using variations in data compression ratio to reveal correlations between known (or deliberately injected) plaintext and secret data combined in a single compressed stream. Another example is the key leakage that can occur when using some public-key systems when cryptographic nonce values used in signing operations are insufficiently random. Bad randomness cannot protect proper functioning of a cryptographic system, even in a benign circumstance, it can easily produce crackable keys that cause key leakage. Information leakage can sometimes be deliberate: for example, an algorithmic converter may be shipped that intentionally leaks small amounts of information, in order to provide its creator with the ability to intercept the users' messages, while still allowing the user to maintain an illusion that the system is secure. This sort of deliberate leakage is sometimes known as a subliminal channel. Generally, only very advanced systems employ defenses against information leakage. Following are the commonly implemented countermeasures : Use steganography to hide the fact that a message is transmitted at all. Use chaffing to make it unclear to whom messages are transmitted (but this does not hide from others the fact that messages are transmitted). For busy re-transmitting proxies, such as a Mixmaster node: randomly delay and shuffle the order of outbound packets - this will assist in disguising a given message's path, especially if there are multiple, popular forwarding nodes, such as are employed with Mixmaster mail forwarding. When a data value is no longer going to be used, erase it from the memory.

    Read more →