AI Data Bias

AI Data Bias — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Patch management

    Patch management

    Patch management (or patch management policy or patch policy or patch management process) is concerned with the identification, acquisition, distribution, testing and installation of patches to systems. Proper patch management can be a net productivity boost for an organization. Patches can be used to defend against and eliminate potential vulnerabilities of a system, so that no threats may exploit them. Problems can arise during patch management, including buggy patches that either fail to fix their problem or introduce new issues. Patch management tools help orchestrate all of the procedures involved in patch management. == Description == Patch management is defined as a sub-practice of various disciplines including vulnerability management (part of security management), lifecycle management (with further possible sub-classification into application lifecycle management and release management), change management, and systems management. The practice is broadly concerned with the identification, acquisition, distribution, and installation of patches to systems. Some definitions of patch management are as a software-level practice, while others are as a systems-level process: software, drivers, and firmware. == Cost–benefit analysis == While reserving time for patching takes up enterprise resources, there are balancing factors which can make proper patch management into a net productivity boost for an organization. Up-to-date systems often perform more efficiently, less costly, with less errors, less security risks, and better user workflow. Additionally, compliance with changing local and federal regulations are more likely to be satisfied. Patching security vulnerabilities has been one among many competing priorities for organizations, leading to longer periods before patching for some organizations. Equifax was too slow to implement its 2015 patch management plan to be able to mitigate or prevent the 2017 Equifax data breach, leading to scrutiny from regulators. == Relation to security management == Patches can be used to defend against and eliminate potential vulnerabilities of a system, so that no threats may exploit them; therefore, patch management can be considered a sub-discipline of vulnerability management. Every patchable device in a system presents an attack surface that must be secured. === Time plan === Automatic updates are where the patch is applied automatically with little to know actions or planning required. This approach is recommended for many individuals and organizations. Some organizations also have to prioritize which patches to prioritize given limited resources. Patch Tuesday is the most common process when major companies like Microsoft and Adobe release patches on a known date so that companies can plan resources around implementing the patches more quickly. Linux is open-sourced and patches can be released at any time, leading some to rely on mailing lists or other ways to be alerted to updates. === Inventory === Taking an inventory of software and hardware, including versions can make it easier to correlate with bugs or patches as they become known. Taking stock of how much education and support others in an organization need to install their patches can also help for planning how to implement the patch or design systems to begin with. Streamlining the process by using tools that can communicate with each other can also help to reduce the time of exposure to known vulnerabilities. == Challenges == There are a multitude of problems that can arise during patch management. A common issue is buggy patches, which either fail to fix their problem or introduce new issues. Another issue is deployment synchronization, since various subsystems may receive instructions to update at different times. Similarly, the difficulty of patch management across many devices may grow at an uncontrollable rate depending on organizational size. One prominent demonstration of the challenges facing proper patch management was the buggy Falcon Sensor patch by CrowdStrike which caused one of the worst IT outages of all time. == Implementations == A patch management tool (alternatively patch manager, patch management system, patch management software, or centralized patch management) help orchestrate all of the procedures involved in patch management. Tools can be in-house (applied locally by local administrators), or external, as with managed service providers (applied externally by a provider). === Patch management software === Windows Update for Business, System Center Configuration Manager, and Windows Server Update Services offer control over patch deployment, with features enabling testing, scheduling updates, and setting custom configurations on Windows platforms. === Managed service providers === == Regulatory requirements (United States) == Timely patching of software vulnerabilities is a requirement under multiple regulatory frameworks in the United States. The Health Insurance Portability and Accountability Act (HIPAA) Security Rule requires covered entities to protect electronic protected health information by implementing security measures sufficient to reduce risks to a reasonable and appropriate level, which industry guidance has long interpreted to include timely patch management. A proposed new HIPAA Security Rule would make patch management requirements explicit, mandating that covered entities and business associates deploy security patches and updates within a defined risk-based timeline and maintain written procedures for prioritizing, testing, and applying patches to systems that store, process, or transmit ePHI. The 2025 proposal continues to receive industry pushback as of December 2025. HIPAA was last updated in 2013. The Payment Card Industry Data Security Standard (PCI DSS) requires organizations to protect system components from known vulnerabilities by installing applicable security patches within one month of release for critical patches. The Cybersecurity and Infrastructure Security Agency (CISA) maintains a Known Exploited Vulnerabilities (KEV) catalog that compels U.S. federal agencies to remediate listed vulnerabilities within specified timelines. Agencies are typically required to patch within 3 weeks, though some vulnerabilities must be fixed within 24 hours.

    Read more →
  • Data verification

    Data verification

    Data verification is a process in which different types of data are checked for accuracy and inconsistencies after data migration is done. In some domains it is referred to Source Data Verification (SDV), such as in clinical trials. Data verification helps to determine whether data was accurately translated when data is transferred from one source to another, is complete, and supports processes in the new system. During verification, there may be a need for a parallel run of both systems to identify areas of disparity and forestall erroneous data loss. Methods for data verification include double data entry, proofreading and automated verification of data. Proofreading data involves someone checking the data entered against the original document. This is also time-consuming and costly. Automated verification of data can be achieved using one way hashes locally or through use of a SaaS based service such as Q by SoLVBL to provide immutable seals to allow verification of the original data.

    Read more →
  • Backup

    Backup

    In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of IT disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server. A backup system contains at least one copy of all data considered worth saving. The data storage requirements can be large. An information repository model may be used to provide structure to this storage. There are different types of data storage devices used for copying backups of data that is already in secondary storage onto archive files. There are also different ways these devices can be arranged to provide geographic dispersion, data security, and portability. Data is selected, extracted, and manipulated for storage. The process can include methods for dealing with live data, including open files, as well as compression, encryption, and de-duplication. Additional techniques apply to enterprise client-server backup. Backup schemes may include dry runs that validate the reliability of the data being backed up. There are limitations and human factors involved in any backup scheme. == Storage == A backup strategy requires an information repository, "a secondary storage space for data" that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or relational database. === 3-2-1 Backup Rule === The backup data needs to be stored, requiring a backup rotation scheme, which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include cloud storage). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to head crashes or damaged spindle motors since they do not have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Physically protected hard drives are an alternative to an offsite copy, but they have limitations like only being able to resist fire for a limited period of time, so an offsite copy still remains as the ideal choice. Because there is no perfect storage, many backup experts recommend maintaining a second copy on a local physical device, even if the data is also backed up offsite. === Backup methods === ==== Unstructured ==== An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation. ==== Full only/System imaging ==== A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying system images, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems. ==== Incremental ==== An incremental backup stores data changed since a reference point in time. Duplicate copies of unchanged data are not copied. Typically a full backup of all files is made once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with the last full backup and then apply the incrementals. Some backup systems can create a synthetic full backup from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files. ==== Near-CDP ==== Continuous Data Protection (CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection. Near-CDP backup applications—often marketed as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary. Near-CDP backup applications use journaling and are typically based on periodic "snapshots", read-only copies of the data frozen at a particular point in time. Near-CDP (except for Apple Time Machine) intent-logs every change on the host system, often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple disk mirroring in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for the consistency of live data, protecting self-consistent files but requiring applications "be quiesced and made ready for backup." Near-CDP is more practicable for ordinary personal backup applications, as opposed to true CDP, which must be run in conjunction with a virtual machine or equivalent and is therefore generally used in enterprise client-server backups. Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation is an appended ".bak" extension to the file name. ==== Reverse incremental ==== A Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using hard links—as Apple Time Machine does, or using binary diffs. ==== Differential ==== A differential backup saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup. A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification file attribute, and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files. === Storage media === Regardless of the repository model that is used, the data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination. ==== Magnetic tape ==== Magnetic tape was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data. Tape is a sequential access medium, so the rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space, tape drives are typically dozens of times as expensive as hard disk drives and optical drives. Tape media are generally rotated on a schedule so at least one set is off-site in case something should happe

    Read more →
  • Symmetric Boolean function

    Symmetric Boolean function

    In mathematics, a symmetric Boolean function is a Boolean function whose value does not depend on the order of its input bits, i.e., it depends only on the number of ones (or zeros) in the input. For this reason they are also known as Boolean counting functions. There are 2n+1 symmetric n-ary Boolean functions. Instead of the truth table, traditionally used to represent Boolean functions, one may use a more compact representation for an n-variable symmetric Boolean function: the (n + 1)-vector, whose i-th entry (i = 0, ..., n) is the value of the function on an input vector with i ones. Mathematically, the symmetric Boolean functions correspond one-to-one with the functions that map n+1 elements to two elements, f : { 0 , 1 , . . . , n } → { 0 , 1 } {\displaystyle f:\{0,1,...,n\}\rightarrow \{0,1\}} . Symmetric Boolean functions are used to classify Boolean satisfiability problems. == Special cases == A number of special cases are recognized: Majority function: their value is 1 on input vectors with more than n/2 ones Threshold functions: their value is 1 on input vectors with k or more ones for a fixed k All-equal and not-all-equal function: their values is 1 when the inputs do (not) all have the same value Exact-count functions: their value is 1 on input vectors with k ones for a fixed k One-hot or 1-in-n function: their value is 1 on input vectors with exactly one one One-cold function: their value is 1 on input vectors with exactly one zero Congruence functions: their value is 1 on input vectors with the number of ones congruent to k mod m for fixed k, m Parity function: their value is 1 if the input vector has odd number of ones The n-ary versions of AND, OR, XOR, NAND, NOR and XNOR are also symmetric Boolean functions. == Properties == In the following, f k {\displaystyle f_{k}} denotes the value of the function f : { 0 , 1 } n → { 0 , 1 } {\displaystyle f:\{0,1\}^{n}\rightarrow \{0,1\}} when applied to an input vector of weight k {\displaystyle k} . === Weight === The weight of the function can be calculated from its value vector: | f | = ∑ k = 0 n ( n k ) f k {\displaystyle |f|=\sum _{k=0}^{n}{\binom {n}{k}}f_{k}} === Algebraic normal form === The algebraic normal form either contains all monomials of certain order m {\displaystyle m} , or none of them; i.e. the Möbius transform f ^ {\displaystyle {\hat {f}}} of the function is also a symmetric function. It can thus also be described by a simple (n+1) bit vector, the ANF vector f ^ m {\displaystyle {\hat {f}}_{m}} . The ANF and value vectors are related by a Möbius relation: f ^ m = ⨁ k 2 ⊆ m 2 f k {\displaystyle {\hat {f}}_{m}=\bigoplus _{k_{2}\subseteq m_{2}}f_{k}} where k 2 ⊆ m 2 {\displaystyle k_{2}\subseteq m_{2}} denotes all the weights k whose base-2 representation is covered by the base-2 representation of m (a consequence of Lucas’ theorem). Effectively, an n-variable symmetric Boolean function corresponds to a log(n)-variable ordinary Boolean function acting on the base-2 representation of the input weight. For example, for three-variable functions: f ^ 0 = f 0 f ^ 1 = f 0 ⊕ f 1 f ^ 2 = f 0 ⊕ f 2 f ^ 3 = f 0 ⊕ f 1 ⊕ f 2 ⊕ f 3 {\displaystyle {\begin{array}{lcl}{\hat {f}}_{0}&=&f_{0}\\{\hat {f}}_{1}&=&f_{0}\oplus f_{1}\\{\hat {f}}_{2}&=&f_{0}\oplus f_{2}\\{\hat {f}}_{3}&=&f_{0}\oplus f_{1}\oplus f_{2}\oplus f_{3}\end{array}}} So the three variable majority function with value vector (0, 0, 1, 1) has ANF vector (0, 0, 1, 0), i.e.: Maj ( x , y , z ) = x y ⊕ x z ⊕ y z {\displaystyle {\text{Maj}}(x,y,z)=xy\oplus xz\oplus yz} === Unit hypercube polynomial === The coefficients of the real polynomial agreeing with the function on { 0 , 1 } n {\displaystyle \{0,1\}^{n}} are given by: f m ∗ = ∑ k = 0 m ( − 1 ) | k | + | m | ( m k ) f k {\displaystyle f_{m}^{}=\sum _{k=0}^{m}(-1)^{|k|+|m|}{\binom {m}{k}}f_{k}} For example, the three variable majority function polynomial has coefficients (0, 0, 1, -2): Maj ( x , y , z ) = ( x y + x z + y z ) − 2 ( x y z ) {\displaystyle {\text{Maj}}(x,y,z)=(xy+xz+yz)-2(xyz)} == Examples ==

    Read more →
  • Aikuma

    Aikuma

    Aikuma is an Android app for collecting speech recordings with time-aligned translations. The app includes a text-free interface for consecutive interpretation, designed for users who are not literate. The Aikuma won Grand Prize in the Open Source Software World Challenge (2013). == Name == Aikuma means "meeting place" in Usarufa, a Papuan language where this software was first used in 2012. == History == Aikuma was developed with sponsorship from the National Science Foundation, including a $101,501 (US) project, "to use mobile telephones to collect larger amounts of data on undocumented endangered languages than would never be possible through usual fieldwork." Aikuma and its modified version (Lig-Aikuma) have been used for collecting substantial quantities of audio in remote indigenous villages. A modified version of the app, called Lig-Aikuma, has been developed at the Université Grenoble Alpes (LIG laboratory) and implements new features such as elicitation of speech from text, images and videos. == Similar Software == Lingua Libre is an online collaborative project and tool by the Wikimedia France association, which can be used as a tool for Language Preservation. Lingua Libre enables to record words, phrases, or sentences of any language, oral (audio recording) or signed (video recording). It is a highly efficient method to record endangered languages since up to 1000 words can be recorded per hour. All the content is under Free License, and speakers of minority languages are encouraged to record their own dialects.

    Read more →
  • Social media surgery

    Social media surgery

    A social media surgery is a gathering at which volunteer "surgeons" with expertise in using web tools, chiefly social media, offer free advice in using such tools, to representatives ("patients") of non-profit organisations, charities, community groups and activists, with "no boring speeches or jargon". The idea was conceived by Pete Ashton, with Nick Booth of Podnosh Ltd, who ran the first such surgery in Birmingham, England, on 15 October 2008. In July 2009, a spin-off surgery (dubbed the "Social media mob") started in Mosman, Australia, and in January 2010, the first spin-off surgery in Africa was held. On 16 February 2012, it was announced that the Social Media Surgery movement had won "the Prime Minister’s Big Society Award". Prime Minister David Cameron said: This is an excellent initiative - such a simple idea and yet so effective. The popularity of these surgeries and the fact that they have inspired so many others across the country to follow in their footsteps, is testament to its brilliance. Congratulations to Nick and all the volunteers who have shared their time and expertise to help so many local groups make the most of the internet to support their community. A great example of the Big Society in action. The scheme also won the 2013 Adult Learners' Week "BBC Learning Through Technology Award".

    Read more →
  • Customer data management

    Customer data management

    Customer data management (CDM) is the ways in which businesses keep track of their customer information and survey their customer base in order to obtain feedback. CDM includes a range of software or cloud computing applications designed to give large organizations rapid and efficient access to customer data. Surveys and data can be centrally located and widely accessible within a company, as opposed to being warehoused in separate departments. CDM encompasses the collection, analysis, organizing, reporting and sharing of customer information throughout an organization. Businesses need a thorough understanding of their customers’ needs if they are to retain and increase their customer base. Efficient CDM solutions provide companies with the ability to deal instantly with customer issues and obtain immediate feedback. As a result, customer retention and customer satisfaction can show marked improvement. According to a study by Aberdeen Group, "above-average and best-in-class companies... attain greater than 20% annual improvement in retention rates, revenues, data accuracy and partner/customer satisfaction rates." == Customer data management and cloud computing == Cloud computing offers an attractive choice for CDM in many companies due to its accessibility and cost-effectiveness. Businesses can decide who, within their company, should have the ability to create, adjust, analyze or share customer information. In December 2010, 52% of Information Technology (IT) professionals worldwide were deploying, or planning to deploy, cloud computing; this percentage is far higher in many countries. == Background == Customer data management, as a term, was coined in the 1990s, pre-dating the alternative term enterprise feedback management (EFM). CDM was introduced as a software solution that would replace earlier disc-based or paper-based surveys and spreadsheet data. Initially, CDM solutions were marketed to businesses as software, which were specific to one company, and often to one department within that company. This was superseded by application service providers (ASPs) where software was hosted for end user organizations, thus avoiding the necessity for IT professionals to deploy and support software. However, ASPs with their single-tenancy architecture were, in turn, superseded by software as a service (SaaS), engineered for multi-tenancy. By 2007 SaaS applications, giving businesses on-demand access to their customer information, were rapidly gaining popularity compared with ASPs. Cloud computing now includes SaaS and many prominent CDM providers offer cloud-based applications to their clients. In recent years, there has been a push away from the term EFM, with many of those working in this area advocating the slightly updated use of CDM. The return to the term CDM is largely based on the greater need for clarity around the solutions offered by companies, and on the desire to retire terminology veering on techno-jargon that customers may have a hard time understanding.

    Read more →
  • Knapsack problem

    Knapsack problem

    The knapsack problem is the following problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine which items to include in the collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where the decision-makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively. The knapsack problem has been studied for more than a century, with early works dating back to 1897. The subset sum problem is a special case of the decision and 0-1 problems where for each kind of item, the weight equals the value: w i = v i {\displaystyle w_{i}=v_{i}} . In the field of cryptography, the term knapsack problem is often used to refer specifically to the subset sum problem. The subset sum problem is one of Karp's 21 NP-complete problems. == Applications == Knapsack problems appear in real-world decision-making processes in a wide variety of fields, such as finding the least wasteful way to cut raw materials, selection of investments and portfolios, selection of assets for asset-backed securitization, and generating keys for the Merkle–Hellman and other knapsack cryptosystems. One early application of knapsack algorithms was in the construction and scoring of tests in which the test-takers have a choice as to which questions they answer. For small examples, it is a fairly simple process to provide the test-takers with such a choice. For example, if an exam contains 12 questions each worth 10 points, the test-taker need only answer 10 questions to achieve a maximum possible score of 100 points. However, on tests with a heterogeneous distribution of point values, it is more difficult to provide choices. Feuerman and Weiss proposed a system in which students are given a heterogeneous test with a total of 125 possible points. The students are asked to answer all of the questions to the best of their abilities. Of the possible subsets of problems whose total point values add up to 100, a knapsack algorithm would determine which subset gives each student the highest possible score. A 1999 study of the Stony Brook University Algorithm Repository showed that, out of 75 algorithmic problems related to the field of combinatorial algorithms and algorithm engineering, the knapsack problem was the 19th most popular and the third most needed after suffix trees and the bin packing problem. == Definition == The most common problem being solved is the 0-1 knapsack problem, which restricts the number x i {\displaystyle x_{i}} of copies of each kind of item to zero or one. Given a set of n {\displaystyle n} items numbered from 1 up to n {\displaystyle n} , each with a weight w i {\displaystyle w_{i}} and a value v i {\displaystyle v_{i}} , along with a maximum weight capacity W {\displaystyle W} , maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ { 0 , 1 } {\displaystyle x_{i}\in \{0,1\}} . Here x i {\displaystyle x_{i}} represents the number of instances of item i {\displaystyle i} to include in the knapsack. Informally, the problem is to maximize the sum of the values of the items in the knapsack so that the sum of the weights is less than or equal to the knapsack's capacity. The bounded knapsack problem (BKP) removes the restriction that there is only one of each item, but restricts the number x i {\displaystyle x_{i}} of copies of each kind of item to a maximum non-negative integer value c {\displaystyle c} : maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ { 0 , 1 , 2 , … , c } . {\displaystyle x_{i}\in \{0,1,2,\dots ,c\}.} The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item and can be formulated as above except that the only restriction on x i {\displaystyle x_{i}} is that it is a non-negative integer. maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ N . {\displaystyle x_{i}\in \mathbb {N} .} One example of the unbounded knapsack problem is given using the figure shown at the beginning of this article and the text "if any number of each book is available" in the caption of that figure. == Computational complexity == The knapsack problem is interesting from the perspective of computer science for many reasons: The decision problem form of the knapsack problem (Can a value of at least V be achieved without exceeding the weight W?) is NP-complete, thus there is no known algorithm that is both correct and fast (polynomial-time) in all cases. There is no known polynomial algorithm which can tell, given a solution, whether it is optimal (which would mean that there is no solution with a larger V). This problem is co-NP-complete. There is a pseudo-polynomial time algorithm using dynamic programming. There is a fully polynomial-time approximation scheme, which uses the pseudo-polynomial time algorithm as a subroutine, described below. Many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly. There is a link between the "decision" and "optimization" problems in that if there exists a polynomial algorithm that solves the "decision" problem, then one can find the maximum value for the optimization problem in polynomial time by applying this algorithm iteratively while increasing the value of k. On the other hand, if an algorithm finds the optimal value of the optimization problem in polynomial time, then the decision problem can be solved in polynomial time by comparing the value of the solution output by this algorithm with the value of k. Thus, both versions of the problem are of similar difficulty. One theme in research literature is to identify what the "hard" instances of the knapsack problem look like, or viewed another way, to identify what properties of instances in practice might make them more amenable than their worst-case NP-complete behaviour suggests. The goal in finding these "hard" instances is for their use in public-key cryptography systems, such as the Merkle–Hellman knapsack cryptosystem. More generally, better understanding of the structure of the space of instances of an optimization problem helps to advance the study of the particular problem and can improve algorithm selection. Furthermore, notable is the fact that the hardness of the knapsack problem depends on the form of the input. If the weights and profits are given as integers, it is weakly NP-complete, while it is strongly NP-complete if the weights and profits are given as rational numbers. However, in the case of rational weights and profits it still admits a fully polynomial-time approximation scheme. === Unit-cost models === The NP-hardness of the Knapsack problem relates to computational models in which the size of integers matters (such as the Turing machine). In contrast, decision trees count each decision as a single step. Dobkin and Lipton show an 1 2 n 2 {\displaystyle {1 \over 2}n^{2}} lower bound on linear decision trees for the knapsack problem, that is, trees where decision nodes test the sign of affine functions. This was generalized to algebraic decision trees by Steele and Yao. If the elements in the problem are real numbers or rationals, the decision-tree lower bound extends to the real random-access machine model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor"). This model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions. An upper bound for a decision-tree model was given by Meyer auf der Heide who showed that for every n there exists an O(n4)-deep linear decision tree that solves the subset-sum problem with n items. Note that this does not imply any upper bound for an algorithm that should solve the problem for any given n. == Solving == Several algorithms are available to solve knapsack problems, based on the dynamic programming approach, the branch and bound approach or hybridizations of both approaches. === Dynamic programming in-advance algorithm === The unbounded knapsack problem (UKP) places no restriction on the number of copies of each kind of item. Besides, here we assume that x i > 0 {\displaystyle x_{i}>0} m [ w ′ ] = max ( ∑ i = 1 n v i x i ) {\displaystyle m[w']=\max \left(\sum _{i=1}^{n}v_{i}x_{i}\right)} subject to ∑

    Read more →
  • Hancom Office

    Hancom Office

    Hancom Office is a proprietary office suite that includes a word processor, spreadsheet software, presentation software, and a PDF editor as well as their online versions accessible via a web browser. It is primarily addressed to Korean users. Hancom Office is written in Java and C++ that runs on Android, iOS, macOS and Windows platforms. == Products == Hangul - Hangul is a word processor developed by Hancom. It is a product that eliminates the inconvenience of the original Hangul word processor, which was limited to Hangul cards or PC models. Originally, the name was written using the '아래아' character, a vowel letter that is obsolete in modern Korean, and it was referred to as 'HWP' (an abbreviation for Hangul Word Processor), '아래아 한글' (Arae-a Hangul), '한/글' (Han/Geul), and so on. Hangul is currently the most widely used word processor in South Korea, often used alongside Microsoft Word. HanWord - word processor compatible with Word HanCell - spreadsheet program HanShow - presentation program Hancom Office Hanword Viewer - For viewing documents created by Hancom Office or Microsoft Office

    Read more →
  • Data custodian

    Data custodian

    In data governance groups, responsibilities for data management are increasingly divided between the business process owners and information technology (IT) departments. Two functional titles commonly used for these roles are data steward and data custodian. Data Stewards are commonly responsible for data content, context, and associated business rules. Data custodians are responsible for the safe custody, transport, storage of the data and implementation of business rules. Simply put, Data Stewards are responsible for what is stored in a data field, while data custodians are responsible for the technical environment and database structure. Common job titles for data custodians are database administrator (DBA), data modeler, ETL developer and data engineer. == Data custodian responsibilities == A data custodian ensures: Access to the data is authorized and controlled Data stewards are identified for each data set Technical processes sustain data integrity Processes exist for data quality issue resolution in partnership with data stewards Technical controls safeguard data Data added to data sets are consistent with the common data model Versions of master data are maintained along with the history of changes Change management practices are applied in maintenance of the database Data content and changes can be audited

    Read more →
  • Social collaboration

    Social collaboration

    Social collaboration refers to processes that help multiple people or groups interact and share information to achieve common goals. Such processes find their 'natural' environment on the Internet, where collaboration and social dissemination of information are made easier by current innovations and the proliferation of the web. Sharing concepts on a digital collaboration environment often facilitates a "brainstorming" process, where new ideas may emerge due to the varied contributions of individuals. These individuals may hail from different walks of life, different cultures and different age groups, their diverse thought processes help in adding new dimensions to ideas, dimensions that previously may have been missed. A crucial concept behind social collaboration is that 'ideas are everywhere.' Individuals are able to share their ideas in an unrestricted environment as anyone can get involved and the discussion is not limited to only those who have domain knowledge. Social collaboration is also known as enterprise social networking, and the products to support it are often branded enterprise social networks (ESNs). It is important that we understand the rhythm of social collaboration. There needs to be a balance, with ease to move from focused solitary work to brainstorming for problem solving in group work. This critical balance can be achieved by creating structures or a work environment where it is not too rigid to prevent brainstorming in group work nor too loose to result in total chaos. Social collaboration should happen at the edge of chaos. Work practices should support social collaboration. The most effective environment is one that supports opportunistic planning. Opportunistic planning provides a general plan but then gives enough room for flexibility to change activities and tasks until the last moment. This way, people are able to cope up with unforeseen developments and not throwing away everything with one grand plan. == Comparison to social networking == Social collaboration is related to social networking, with the distinction that while social networking is individual-centric, social collaboration is entirely group-centric. Generally speaking, social networking means socializing for personal, professional or entertainment purposes, for example, LinkedIn and Facebook. Social collaboration, on the other hand, means working socially to achieve a common goal, for example, GitHub and Quora. Social networking services generally focus on individuals sharing messages in a more-or-less undirected way and receiving messages from many sources into a single personalized activity feed. Social collaboration services, on the other hand, focus on the identification of groups and collaboration spaces in which messages are explicitly directed at the group and the group activity feed is seen the same way by everyone. Social collaboration may refer to time-bound collaborations with an explicit goal to be completed or perpetual collaborations in which the goal is knowledge sharing (e.g. community of practice, online community). == Comparison to crowdsourcing == Social collaboration is similar to crowdsourcing as it involves individuals working together towards a common goal. Crowdsourcing is a method for harnessing specific information from a large, diverse group of people. Unlike social collaboration, which involves much communication and cooperation among a large group of people, crowdsourcing is more like individuals working towards the common goal relatively independently. Therefore, the process of working involves less communication. Andrea Grover, curator of a crowdsourcing art show, explained that collaboration among individuals is an appealing experience, because participation is "a low investment, with the possibility of a high return." == Social collaboration software == Notable social collaboration software includes Glip messaging, Google Apps, Knowledge Plaza Electronic Document System and Social Intranet, Microsoft Lync social collaboration tool for businesses, Slack, Weekdone for managers, and Wrike. == Future == Social collaboration is going to be used as a tool in companies to enhance productivity. Social workers could be able to use social collaboration tools to manage personal tasks, professional projects and social networks with other colleagues within the same organization. Social collaboration will serve as a platform to get people involved and connected. This kind of platform provides a spiritual training practice for social workers. Social collaboration software could help enhance the communication between customers and employees and build trust in the organization. When we need real-time chat, it would be excellent to include every participant in a shared and archived forum which keeps a record of important information and logs. So collaborators need not worry about losing important records while working towards the common goal. The interactive communication and synchronous environment promote understanding among colleagues. Collaboration helps in building strong relationships between workers, which in turn leads to faster problem solving. The close connection between workers and customers creates a scalable organization which naturally increases the trust and faith that customers have in the company. Therefore, the interactive customer relationship levels up customer satisfaction in ways that traditional collaboration methods cannot. Apart from its effect on the way work will be conducted in the future, social collaboration will also affect society. In the coming years social collaboration will be the driving force in societal change as more and more people work together to get their vision across to governments and governing agencies. An example of this is Change.org, an online petition tool where users can help bring their government's attention to pressing social issues that need to be addressed.

    Read more →
  • Menu hack

    Menu hack

    A menu hack is a non-standard method of ordering food, usually at fast-food or fast casual restaurants, that offers a different result than what is explicitly stated on a menu. Menu hacks may range from a simple alternate flavor to "gaming the system" in order to obtain more food than normal. They are often spread on social media platforms such as TikTok, and are more popular with Generation Z, which has been known to customize their orders more than previous generations. Hacks are sometimes officially added to the menu after their popularity grows. However, in some cases, they have been criticized for overburdening fast food employees with outlandish requests, sparking debate as to whether certain menu hacks are unethical. The list of all possible menu hacks is called a secret menu. == History == The term "menu hack" stems from hacker culture and its tradition of overcoming previously imposed limitations. However, the tradition of ordering from a secret menu dates back to the early days of fast food. "Animal style" fries, a word of mouth menu item ordered from In-N-Out since the 1960s, was rumored to have been created by local surfers. In the Information Age, the rise of social media gave influencers the ability to communicate unique food combinations to their followers, which proved to go viral easily. Design mistakes in food ordering apps also proved to be easily exploitable. In some cases, these hacks boosted the profile of brands on social media, while in others, they caused financial harm when the company was unprepared to handle the sudden influx of unusual orders. One restaurant chain notable for the phenomenon is Chipotle Mexican Grill. A viral hack from Alexis Frost, suggesting a quesadilla with fajita vegetables inside, dipped in Chipotle vinaigrette mixed with sour cream, obtained 1.9 million views on TikTok, overloading the chain's workers, who had to work harder to prepare more vegetables and vinaigrette. Some restaurants began to deny the dish to customers, forcing them to only order meat and cheese on quesadillas. The company ultimately left the dish on the menu, but urged customers to stop ordering it via social media. When it later officially added the Fajita Quesadilla to the menu, digital sales nearly doubled. A method to order nachos, which are not officially on the menu, was also noted by customers. Starbucks is also famous for menu hacks, including the Pink Drink, a "Barbiecore" beverage in which coconut milk replaced the water in the strawberry açaí refresher. After it went viral, the company made it a permanent menu item and distributed it bottled in grocery stores. == Controversy == Menu hacks have been subject to a growing backlash, with employees stating that they "dread" younger customers due to the proliferation of unusual orders. Service industry workers, already overworked and underpaid, have called the rise of menu hacks and their difficulty to make an additional reason to unionize and demand higher wages.

    Read more →
  • KeyBase

    KeyBase

    KeyBase is a database and web application for managing and deploying interactive taxonomic keys for plants and animals developed by the Royal Botanic Gardens Victoria. KeyBase provides a medium where pathway keys which were traditionally developed for print and other classical types of media, can be used more effectively in the internet environment. The platform uses a concept called "keys" which can be easily linked together, joined with other keys, or merged into larger other seamless keys groups, with each still available to be browsed independently. Keys in the KeyBase database can be filtered and displayed in a variety of ways, filters, and formats.

    Read more →
  • Business intelligence

    Business intelligence

    Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information to inform business strategies and business operations. Common functions of BI technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics. BI tools can handle large amounts of structured and sometimes unstructured data to help organizations identify, develop, and otherwise create new strategic business opportunities. They aim to allow for the easy interpretation of these big data. Identifying new opportunities and implementing an effective strategy based on insights is assumed to potentially provide businesses with a competitive market advantage and long-term stability, and help them take strategic decisions. Business intelligence can be used by enterprises to support a wide range of business decisions ranging from operational to strategic. Basic operating decisions include product positioning or pricing. Strategic business decisions involve priorities, goals, and directions at the broadest level. In all cases, business intelligence is considered most effective when it combines data from the market in which a company operates (external data) with data from internal company sources, such as financial and operational information. When integrated, external and internal data provide a comprehensive view that creates ‘intelligence’ not possible from any single data source alone. Among their many uses, business intelligence tools empower organizations to gain insight into new markets, to assess demand and suitability of products and services for different market segments, and to gauge the impact of marketing efforts. BI applications use data gathered from a data warehouse (DW) or from a data mart, and the concepts of BI and DW combine as "BI/DW" or as "BIDW". A data warehouse contains a copy of analytical data that facilitates decision support. == History == The earliest known use of the term business intelligence is in Richard Millar Devens' Cyclopædia of Commercial and Business Anecdotes (1865). Devens used the term to describe how the banker Sir Henry Furnese gained profit by receiving and acting upon information about his environment, prior to his competitors: Throughout Holland, Flanders, France, and Germany, he maintained a complete and perfect train of business intelligence. The news of the many battles fought was thus received first by him, and the fall of Namur added to his profits, owing to his early receipt of the news. The ability to collect and react accordingly based on the information retrieved, Devens says, is central to business intelligence. When Hans Peter Luhn, a researcher at IBM, used the term business intelligence in an article published in 1958, he employed the Webster's Dictionary definition of intelligence: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal." In 1989, Howard Dresner (later a Gartner analyst) proposed business intelligence as an umbrella term to describe "concepts and methods to improve business decision making by using fact-based support systems." It was not until the late 1990s that this usage was widespread. == Definition == According to Solomon Negash and Paul Gray, business intelligence (BI) can be defined as systems that combine: Data gathering Data storage Knowledge management with analysis to evaluate complex corporate and competitive information for presentation to planners and decision makers, with the objective of improving the timeliness and the quality of the input to the decision process." According to Forrester Research, business intelligence is "a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making." Under this definition, business intelligence encompasses information management (data integration, data quality, data warehousing, master-data management, text- and content-analytics, et al.). Therefore, Forrester refers to data preparation and data usage as two separate but closely linked segments of the business-intelligence architectural stack. Some elements of business intelligence are: Multidimensional aggregation and allocation Denormalization, tagging, and standardization Realtime reporting with analytical alert A method of interfacing with unstructured data sources Group consolidation, budgeting, and rolling forecasts Statistical inference and probabilistic simulation Key performance indicators optimization Version control and process management Open item management Forrester distinguishes this from the business-intelligence market, which is "just the top layers of the BI architectural stack, such as reporting, analytics, and dashboards." === Compared with competitive intelligence === Though the term business intelligence is sometimes a synonym for competitive intelligence (because they both support decision making), BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes while competitive intelligence gathers, analyzes, and disseminates information with a topical focus on company competitors. If understood broadly, competitive intelligence can be considered as a subset of business intelligence. === Compared with business analytics === Business intelligence and business analytics are sometimes used interchangeably, but there are alternate definitions. Thomas Davenport, professor of information technology and management at Babson College argues that business intelligence should be divided into querying, reporting, Online analytical processing (OLAP), an "alerts" tool, and business analytics. In this definition, business analytics is the subset of BI focusing on statistics, prediction, and optimization, rather than the reporting functionality. == Unstructured data == Business operations can generate a very large amount of data in the form of emails, memos, notes from call centers, news, user groups, chats, reports, web pages, presentations, image files, video files, and marketing material. According to Merrill Lynch, more than 85% of all business information exists in these forms; a company might only use such a document a single time. Because of the way it is produced and stored, this information is either unstructured or semi-structured. The management of semi-structured data is an unsolved problem in the information technology industry. According to projections from Gartner (2003), white-collar workers spend 30–40% of their time searching, finding, and assessing unstructured data. BI uses both structured and unstructured data. The former is easy to search, and the latter contains a large quantity of the information needed for analysis and decision-making. Because of the difficulty of properly searching, finding, and assessing unstructured or semi-structured data, organizations may not draw upon these vast reservoirs of information, which could influence a particular decision, task, or project. This can ultimately lead to poorly informed decision-making. Therefore, when designing a business intelligence/DW solution, the specific problems associated with semi-structured and unstructured data must be accommodated, as well as those associated with structured data. === Limitations of semi-structured and unstructured data === There are several challenges to developing BI with semi-structured data. According to Inmon & Nesavich, some of those are: Physically accessing unstructured textual data – unstructured data is stored in a huge variety of formats. Terminology – Among researchers and analysts, there is a need to develop standardized terminology. Volume of data – As stated earlier, up to 85% of all data exists as semi-structured data. Couple that with the need for word-to-word and semantic analysis. Searchability of unstructured textual data – A simple search on some data, e.g. apple, results in links where there is a reference to that precise search term. (Inmon & Nesavich, 2008) gives an example: "a search is made on the term felony. In a simple search, the term felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made. But a simple search is crude. It does not find references to crime, arson, murder, embezzlement, vehicular homicide, and such, even though these crimes are types of felonies". === Metadata === To solve problems with searchability and assessment of data, it is necessary to know something about the content. This can be done by adding context through the use of metadata. Many systems already capture some metadata (e.g. filename, author, size, etc.), but more usef

    Read more →
  • Harvest now, decrypt later

    Harvest now, decrypt later

    Harvest now, decrypt later (HNDL) is a surveillance strategy that relies on the acquisition and long-term storage of currently unreadable encrypted data awaiting possible breakthroughs in decryption technology that would render it readable in the future—a hypothetical date referred to as Y2Q (a reference to Y2K), or Q-Day. The most common concern is the prospect of developments in quantum computing which would allow current strong encryption algorithms to be broken at some time in the future, making it possible to decrypt any stored material that had been encrypted using those algorithms. However, the improvement in decryption technology need not be due to a quantum-cryptographic advance; any other form of attack capable of enabling decryption would be sufficient. The existence of this strategy has led to concerns about the need to urgently deploy post-quantum cryptography; even though no practical quantum attacks yet exist, some data stored now may still remain sensitive even decades into the future. As of 2022, the U.S. federal government has proposed a roadmap for organizations to start migrating toward quantum-cryptography-resistant algorithms to mitigate these threats. This new version of Commercial National Security Algorithm Suite uses publicly-available algorithms and is allowed for government use up to the TOP SECRET level. == Terminology and scope == The term “harvest now, decrypt later” encompasses various surveillance or espionage operations in which ciphertext or encrypted communications are collected today with the view that they may one day be decrypted, given sufficient advances in computing power or cryptanalysis. The abbreviation HNDL is sometimes used in technical and policy documents. The “Y2Q” (or “Q-Day”) label draws an analogy to the Y2K date-change issue, emphasising a potential future point at which current cryptography may collapse. The strategy is particularly relevant for data with long confidentiality lifetimes, such as diplomatic communications, personal health records, critical infrastructure logs, or intellectual property. == Mitigation strategies == The primary defense against HNDL attacks is the transition to post-quantum cryptography (PQC), which utilizes algorithms believed to be secure against quantum computer attacks. However, because PQC protects the data payload digitally, rather than the transmission itself, the encrypted data can still be harvested and stored. A complementary approach involves physical layer security (also known as optical layer encryption or photonic shielding). Unlike algorithmic encryption, this method modifies the optical waveform itself—often by burying the signal within optical noise or using spectral phase encoding—to render the transmission unrecordable by standard receivers. By preventing the attacker from capturing a valid signal in the first place, this approach aims to eliminate the "harvest" phase of the threat. Commercial implementations of harvest-proof optical encryption have been developed by firms such as CyberRidge to secure long-haul fiber networks. Field trials have demonstrated 100 Gbps throughput over legacy DWDM networks using this method.

    Read more →