AI Cv Review

AI Cv Review — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Kolmogorov–Arnold Networks

    Kolmogorov–Arnold Networks

    Kolmogorov–Arnold Networks (KANs) are a type of artificial neural network architecture inspired by the Kolmogorov–Arnold representation theorem, also known as the superposition theorem. Unlike traditional multilayer perceptrons (MLPs), which rely on fixed activation functions and linear weights, KANs replace each weight with a learnable univariate function, often represented using splines. == History == KANs (Kolmogorov–Arnold Networks) were proposed by Liu et al. (2024) as a generalization of the Kolmogorov–Arnold representation theorem (KART), aiming to outperform MLPs in small-scale AI and scientific tasks. Before KANs, numerous studies explored KART's connections to neural networks or used it as a basis for designing new network architectures. In the 1980s and 1990s, early research applied KART to neural network design. Kůrková et al. (1992), Hecht-Nielsen (1987), and Nees (1994) established theoretical foundations for multilayer networks based on KART. Igelnik et al. (2003) introduced the Kolmogorov Spline Network using cubic splines to model complex functions. Sprecher (1996, 1997) introduced numerical methods for building network layers, while Nakamura et al. (1993) created activation functions with guaranteed approximation accuracy. These works linked KART's theoretical potential with practical neural network implementation. KART has also been used in other computational and theoretical fields. Coppejans (2004) developed nonparametric regression estimators using B-splines, Bryant (2008) applied it to high-dimensional image tasks, Liu (2015) investigated theoretical applications in optimal transport and image encryption, and more recently, Polar and Poluektov (2021) used Urysohn operators for efficient KART construction, while Fakhoury et al. (2022) introduced ExSpliNet, integrating KART with probabilistic trees and multivariate B-splines for improved function approximation. == Architecture == KANs are based on the Kolmogorov–Arnold representation theorem, which was linked to the 13th Hilbert problem. Given x = ( x 1 , x 2 , … , x n ) {\displaystyle x=(x_{1},x_{2},\dots ,x_{n})} consisting of n variables, a multivariate continuous function f ( x ) {\displaystyle f(x)} can be represented as: f ( x ) = f ( x 1 , … , x n ) = ∑ q = 1 2 n + 1 Φ q ( ∑ p = 1 n φ q , p ( x p ) ) {\displaystyle f(x)=f(x_{1},\dots ,x_{n})=\sum _{q=1}^{2n+1}\Phi _{q}\left(\sum _{p=1}^{n}\varphi _{q,p}(x_{p})\right)} (1) This formulation contains two nested summations: an outer and an inner sum. The outer sum ∑ q = 1 2 n + 1 {\displaystyle \sum _{q=1}^{2n+1}} aggregates 2 n + 1 {\displaystyle 2n+1} terms, each involving a function Φ q : R → R {\displaystyle \Phi _{q}:\mathbb {R} \to \mathbb {R} } . The inner sum ∑ p = 1 n {\displaystyle \sum _{p=1}^{n}} computes n terms for each q, where each term φ q , p : [ 0 , 1 ] → R {\displaystyle \varphi _{q,p}:[0,1]\to \mathbb {R} } is a continuous function of the single variable x p {\displaystyle x_{p}} . The inner continuous functions φ q , p {\displaystyle \varphi _{q,p}} are universal, independent of f {\displaystyle f} , while the outer functions Φ q {\displaystyle \Phi _{q}} depend on the specific function f {\displaystyle f} being represented. The representation (1) holds for all multivariate functions f {\displaystyle f} as proved in . If f {\displaystyle f} is continuous, then the outer functions Φ q {\displaystyle \Phi _{q}} are continuous; if f {\displaystyle f} is discontinuous, then the corresponding Φ q {\displaystyle \Phi _{q}} are generally discontinuous, while the inner functions φ q , p {\displaystyle \varphi _{q,p}} remain the same universal functions. Liu et al. proposed the name KAN. A general KAN network consisting of L layers takes x to generate the output as: K A N ( x ) = ( Φ L − 1 ∘ Φ L − 2 ∘ ⋯ ∘ Φ 1 ∘ Φ 0 ) x {\displaystyle \mathrm {KAN} (x)=(\Phi ^{L-1}\circ \Phi ^{L-2}\circ \cdots \circ \Phi ^{1}\circ \Phi ^{0})x} (3) Here, Φ l {\displaystyle \Phi ^{l}} is the function matrix of the l-th KAN layer or a set of pre-activations. Let i denote the neuron of the l-th layer and j the neuron of the (l+1)-th layer. The activation function φ j , i l {\displaystyle \varphi _{j,i}^{l}} connects (l, i) to (l+1, j): φ j , i l , l = 0 , … , L − 1 , i = 1 , … , n l , j = 1 , … , n l + 1 {\displaystyle \varphi _{j,i}^{l},\quad l=0,\dots ,L-1,\;i=1,\dots ,n_{l},\;j=1,\dots ,n_{l+1}} (4) where nl is the number of nodes of the l-th layer. Thus, the function matrix Φ l {\displaystyle \Phi ^{l}} can be represented as an n l + 1 × n l {\displaystyle n_{l+1}\times n_{l}} matrix of activations: x l + 1 = ( φ 1 , 1 l ( ⋅ ) φ 1 , 2 l ( ⋅ ) ⋯ φ 1 , n l l ( ⋅ ) φ 2 , 1 l ( ⋅ ) φ 2 , 2 l ( ⋅ ) ⋯ φ 2 , n l l ( ⋅ ) ⋮ ⋮ ⋱ ⋮ φ n l + 1 , 1 l ( ⋅ ) φ n l + 1 , 2 l ( ⋅ ) ⋯ φ n l + 1 , n l l ( ⋅ ) ) x l {\displaystyle x^{l+1}={\begin{pmatrix}\varphi _{1,1}^{l}(\cdot )&\varphi _{1,2}^{l}(\cdot )&\cdots &\varphi _{1,n_{l}}^{l}(\cdot )\\\varphi _{2,1}^{l}(\cdot )&\varphi _{2,2}^{l}(\cdot )&\cdots &\varphi _{2,n_{l}}^{l}(\cdot )\\\vdots &\vdots &\ddots &\vdots \\\varphi _{n_{l+1},1}^{l}(\cdot )&\varphi _{n_{l+1},2}^{l}(\cdot )&\cdots &\varphi _{n_{l+1},n_{l}}^{l}(\cdot )\end{pmatrix}}x^{l}} == Implementations == To make the KAN layers optimizable, the inner function is formed by the combination of spline and basic functions as the formula: φ ( x ) = w b b ( x ) + w s spline ( x ) {\displaystyle \varphi (x)=w_{b}\,b(x)+w_{s}\,{\text{spline}}(x)} where b ( x ) {\displaystyle b(x)} is the basic function, usually defined as s i l u ( x ) = x / ( 1 + e x ) {\displaystyle silu(x)=x/(1+e^{x})} and w b {\displaystyle w_{b}} is the base weight matrix. Also, w s {\displaystyle w_{s}} is the spline weight matrix and spline ( x ) {\displaystyle {\text{spline}}(x)} is the spline function. The spline function can be a sum of B-splines. spline ( x ) = ∑ i c i B i ( x ) {\displaystyle {\text{spline}}(x)=\sum _{i}c_{i}B_{i}(x)} Many studies suggested to use other polynomial and curve functions instead of B-spline to create new KAN variants. == Functions used == The choice of functional basis strongly influences the performance of KANs. Common function families include: B-splines: Provide locality, smoothness, and interpretability; they are the most widely used in current implementations. RBFs (include Gaussian RBFs): Capture localized features in data and are effective in approximating functions with non-linear or clustered structures. Chebyshev polynomials: Offer efficient approximation with minimized error in the maximum norm, making them useful for stable function representation. Rational function: Useful for approximating functions with singularities or sharp variations, as they can model asymptotic behavior better than polynomials. Fourier series: Capture periodic patterns effectively and are particularly useful in domains such as physics-informed machine learning. Wavelet functions (DoG, Mexican hat, Morlet, and Shannon): Used for feature extraction as they can capture both high-frequency and low-frequency data components. Piecewise linear functions: Provide efficient approximation for multivariate functions in KANs. == Usage == In some modern neural architectures like convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers, KANs are typically used as drop-in substitutes for MLP layers. Despite KANs' general-purpose design, researchers have created and used them for a number of tasks: Scientific machine learning (SciML): Function fitting, partial differential equations (PDEs) and physical/mathematical laws. Continual learning: KANs better preserve previously learned information during incremental updates, avoiding catastrophic forgetting due to the locality of spline adjustments. Graph neural networks: Extensions such as Kolmogorov–Arnold Graph Neural Networks (KA-GNNs) integrate KAN modules into message-passing architectures, showing improvements in molecular property prediction tasks. Sensor data processing: Kolmogorov–Arnold Networks (KANs) have recently been applied to sensor data processing due to their ability to model complex nonlinear relationships with relatively few parameters and improved interpretability compared to conventional multilayer perceptrons. Applications include industrial soft sensors, biomedical signal analysis, remote sensing, and environmental monitoring systems. == Drawbacks == KANs can be computationally intensive and require a large number of parameters due to their use of polynomial functions to capture data.

    Read more →
  • Cut, copy, and paste

    Cut, copy, and paste

    Cut, copy, and paste are essential commands of modern human–computer interaction and user interface design. They offer an interprocess communication technique for transferring data through a computer's user interface. The cut command removes the selected data from its original position, and the copy command creates a duplicate; in both cases the selected data is kept in temporary storage called the clipboard. Clipboard data is later inserted wherever a paste command is issued. The data remains available to any application supporting the feature, thus allowing easy data transfer between applications. The command names are a (skeuomorphic) interface metaphor based on the physical procedure used in manuscript print editing to create a page layout, like with paper. The commands were pioneered into computing by Xerox PARC in 1974, popularized by Apple Computer in the 1983 Lisa workstation and the 1984 Macintosh computer, and in a few home computer applications such as the 1984 word processor Cut & Paste. This interaction technique has close associations with related techniques in graphical user interfaces (GUIs) that use pointing devices such as a computer mouse (by drag and drop, for example). Typically, clipboard support is provided by an operating system as part of its GUI and widget toolkit. The capability to replicate information with ease, changing it between contexts and applications, involves privacy concerns because of the risks of disclosure when handling sensitive information. Terms like cloning, copy forward, carry forward, or re-use refer to the dissemination of such information through documents, and may be subject to regulation by administrative bodies. == History == === Origins === The term "cut and paste" comes from the traditional practice in manuscript editing, whereby people cut paragraphs from a page with scissors and paste them onto another page. This practice remained standard into the 1980s. Stationery stores sold "editing scissors" with blades long enough to cut an 8½"-wide page. The advent of photocopiers made the practice easier and more flexible. The act of copying or transferring text from one part of a computer-based document ("buffer") to a different location within the same or different computer-based document was a part of the earliest on-line computer editors. As soon as computer data entry moved from punch-cards to online files (in the mid/late 1960s) there were "commands" for accomplishing this operation. This mechanism was often used to transfer frequently-used commands or text snippets from additional buffers into the document, as was the case with the QED text editor. === Early methods === The earliest editors (designed for teleprinter terminals) provided keyboard commands to delineate a contiguous region of text, then delete or move it. Since moving a region of text requires first removing it from its initial location and then inserting it into its new location, various schemes had to be invented to allow for this multi-step process to be specified by the user. Often this was done with a "move" command, but some text editors required that the text be first put into some temporary location for later retrieval/placement. In 1983, the Apple Lisa became the first text editing system to call that temporary location "the clipboard". Earlier control schemes such as NLS used a verb—object command structure, where the command name was provided first and the object to be copied or moved was second. The inversion from verb—object to object—verb on which copy and paste are based, where the user selects the object to be operated before initiating the operation, was an innovation crucial for the success of the desktop metaphor as it allowed copy and move operations based on direct manipulation. === Popularization === Inspired by early line and character editors, such as Pentti Kanerva's TV-Edit, that broke a move or copy operation into two steps—between which the user could invoke a preparatory action such as navigation—Lawrence G. "Larry" Tesler proposed the names "cut" and "copy" for the first step and "paste" for the second step. Beginning in 1974, he and colleagues at Xerox PARC implemented several text editors that used cut/copy-and-paste commands to move and copy text. Apple Computer popularized this paradigm with its Lisa (1983) and Macintosh (1984) operating systems and applications. The functions were mapped to key combinations using the ⌘ Command key as a special modifier, which is held down while also pressing X for cut, C for copy, or V for paste. These few keyboard shortcuts allow the user to perform all the basic editing operations, and the keys are clustered at the left end of the bottom row of the standard QWERTY keyboard. These are the standard shortcuts: Control-Z (or ⌘ Command+Z) to undo Control-X (or ⌘ Command+X) to cut Control-C (or ⌘ Command+C) to copy Control-V (or ⌘ Command+V) to paste The IBM Common User Access (CUA) standard also uses combinations of the Insert, Del, Shift and Control keys. Early versions of Windows used the IBM standard. Microsoft later also adopted the Apple key combinations with the introduction of Windows, using the control key as modifier key. Similar patterns of key combinations, later borrowed by others, are widely available in most GUI applications. The original cut, copy, and paste workflow, as implemented at PARC, utilizes a unique workflow: With two windows on the same screen, the user could use the mouse to pick a point at which to make an insertion in one window (or a segment of text to replace). Then, by holding shift and selecting the copy source elsewhere on the same screen, the copy would be made as soon as the shift was released. Similarly, holding shift and control would copy and cut (delete) the source. This workflow requires many fewer keystrokes/mouse clicks than the current multi-step workflows, and did not require an explicit copy buffer. It was dropped, one presumes, because the original Apple and IBM GUIs were not high enough density to permit multiple windows, as were the PARC machines, and so multiple simultaneous windows were rarely used. == Cut and paste == Computer-based editing can involve very frequent use of cut-and-paste operations. Most software-suppliers provide several methods for performing such tasks, and this can involve (for example) key combinations, pulldown menus, pop-up menus, or toolbar buttons. The user selects or "highlights" the text or file for moving by some method, typically by dragging over the text or file name with the pointing-device or holding down the Shift key while using the arrow keys to move the text cursor. The user performs a "cut" operation via key combination Ctrl+x (⌘+x for Macintosh users), menu, or other means. Visibly, "cut" text immediately disappears from its location. "Cut" files typically change color to indicate that they will be moved. Conceptually, the text has now moved to a location often called the clipboard. The clipboard typically remains invisible. On most systems only one clipboard location exists, hence another cut or copy operation overwrites the previously stored information. Many UNIX text-editors provide multiple clipboard entries, as do some Macintosh programs such as Clipboard Master, and Windows clipboard-manager programs such as the one in Microsoft Office. The user selects a location for insertion by some method, typically by clicking at the desired insertion point. A paste operation takes place which visibly inserts the clipboard text at the insertion point. (The paste operation does not typically destroy the clipboard text: it remains available in the clipboard and the user can insert additional copies at other points). Whereas cut-and-paste often takes place with a mouse-equivalent in Windows-like GUI environments, it may also occur entirely from the keyboard, especially in UNIX text editors, such as Pico or vi. Cutting and pasting without a mouse can involve a selection (for which Ctrl+x is pressed in most graphical systems) or the entire current line, but it may also involve text after the cursor until the end of the line and other more sophisticated operations. The clipboard usually stays invisible, because the operations of cutting and pasting, while actually independent, usually take place in quick succession, and the user (usually) needs no assistance in understanding the operation or maintaining mental context. Some application programs provide a means of viewing, or sometimes even editing, the data on the clipboard. == Copy and paste == The term "copy-and-paste" refers to the popular, simple method of reproducing text or other data from a source to a destination. It differs from cut and paste in that the original source text or data does not get deleted or removed. The popularity of this method stems from its simplicity and the ease with which users can move data between various applications visually – without resorting to permanent storage. Use in healthcare do

    Read more →
  • Sysomos

    Sysomos

    Sysomos Inc. is a Toronto-based social media analytics company owned by Outside Insight market leaders Meltwater. The company developed text analytics and machine learning technologies for user generated content, and served 80% of the top agencies and Fortune 500. == History == Sysomos was founded by Nilesh Bansal and Nick Koudas. The company is a spinoff of the University of Toronto research project BlogScope. The BlogScope project, which started in 2005, resulted in creation of the underlying content aggregation and analysis engine commercialized by Sysomos. The company raised venture capital in 2008 and was acquired by Marketwire in 2010. The company's original flagship product, Media Analysis Platform (MAP), mines and analyzes content from social media or user-generated content to create a picture of media coverage. Sysomos launched its flagship offering MAP in Sept 2007, followed by addition of Heartbeat to its product suite in 2009. In addition to the two main products, the company released FourWhere, a free location-based social search service that mashes up Foursquare in March 2010. The company also offers Sysomos Heartbeat which provides social media monitoring and engagement capabilities to communication professionals, brand managers and customer support groups. In 2013, Heartbeat was extended to add publishing components to deliver a complete end-to-end social media marketing platform. On July 6, 2010, it was announced that Marketwire, a press release distribution company, had acquired Sysomos. After the acquisition, Sysomos founders Nick Koudas and Nilesh Bansal, left Sysomos to start Aislelabs. In February 2015, Sysomos split from Marketwired, as an independent company, and appointed Adnan Ahmed as the new CEO. In March 2015, newly independent Sysomos launched a redesign for its Heartbeat product and a new API for its MAP product. In the same year, the company acquired Expion. In September 2016, Peter Heffring was announced as the new CEO. In April 2017, Sysomos showcased a new unified platform offering new insights. In April 2018, media monitoring firm Meltwater announced it had acquired Sysomos. The CEO of Sysomos, Peter Heffring, said the company will continue to operate as an independent unit of Meltwater. Heffring will run the social analytics division of Meltwater. == Reports == Inside Twitter series of reports is the most extensive third-party survey on Twitter's growth and demographics. Another extensive survey regarding the top 5% of most active Twitter users found that over 25% of all tweets are machine created. The report also confirms Twitter's international growth. Inside Facebook Pages report found that only four percent of pages have more than 10,000 fans, 0.76% of pages have more than 100,000 fans, and 0.05% of pages (or 297 in total) have more than a million fans. Inside YouTube reports focus more on video hosting services and YouTube.

    Read more →
  • Business intelligence

    Business intelligence

    Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information to inform business strategies and business operations. Common functions of BI technologies include reporting, online analytical processing, analytics, dashboard development, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics, and prescriptive analytics. BI tools can handle large amounts of structured and sometimes unstructured data to help organizations identify, develop, and otherwise create new strategic business opportunities. They aim to allow for the easy interpretation of these big data. Identifying new opportunities and implementing an effective strategy based on insights is assumed to potentially provide businesses with a competitive market advantage and long-term stability, and help them take strategic decisions. Business intelligence can be used by enterprises to support a wide range of business decisions ranging from operational to strategic. Basic operating decisions include product positioning or pricing. Strategic business decisions involve priorities, goals, and directions at the broadest level. In all cases, business intelligence is considered most effective when it combines data from the market in which a company operates (external data) with data from internal company sources, such as financial and operational information. When integrated, external and internal data provide a comprehensive view that creates ‘intelligence’ not possible from any single data source alone. Among their many uses, business intelligence tools empower organizations to gain insight into new markets, to assess demand and suitability of products and services for different market segments, and to gauge the impact of marketing efforts. BI applications use data gathered from a data warehouse (DW) or from a data mart, and the concepts of BI and DW combine as "BI/DW" or as "BIDW". A data warehouse contains a copy of analytical data that facilitates decision support. == History == The earliest known use of the term business intelligence is in Richard Millar Devens' Cyclopædia of Commercial and Business Anecdotes (1865). Devens used the term to describe how the banker Sir Henry Furnese gained profit by receiving and acting upon information about his environment, prior to his competitors: Throughout Holland, Flanders, France, and Germany, he maintained a complete and perfect train of business intelligence. The news of the many battles fought was thus received first by him, and the fall of Namur added to his profits, owing to his early receipt of the news. The ability to collect and react accordingly based on the information retrieved, Devens says, is central to business intelligence. When Hans Peter Luhn, a researcher at IBM, used the term business intelligence in an article published in 1958, he employed the Webster's Dictionary definition of intelligence: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal." In 1989, Howard Dresner (later a Gartner analyst) proposed business intelligence as an umbrella term to describe "concepts and methods to improve business decision making by using fact-based support systems." It was not until the late 1990s that this usage was widespread. == Definition == According to Solomon Negash and Paul Gray, business intelligence (BI) can be defined as systems that combine: Data gathering Data storage Knowledge management with analysis to evaluate complex corporate and competitive information for presentation to planners and decision makers, with the objective of improving the timeliness and the quality of the input to the decision process." According to Forrester Research, business intelligence is "a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making." Under this definition, business intelligence encompasses information management (data integration, data quality, data warehousing, master-data management, text- and content-analytics, et al.). Therefore, Forrester refers to data preparation and data usage as two separate but closely linked segments of the business-intelligence architectural stack. Some elements of business intelligence are: Multidimensional aggregation and allocation Denormalization, tagging, and standardization Realtime reporting with analytical alert A method of interfacing with unstructured data sources Group consolidation, budgeting, and rolling forecasts Statistical inference and probabilistic simulation Key performance indicators optimization Version control and process management Open item management Forrester distinguishes this from the business-intelligence market, which is "just the top layers of the BI architectural stack, such as reporting, analytics, and dashboards." === Compared with competitive intelligence === Though the term business intelligence is sometimes a synonym for competitive intelligence (because they both support decision making), BI uses technologies, processes, and applications to analyze mostly internal, structured data and business processes while competitive intelligence gathers, analyzes, and disseminates information with a topical focus on company competitors. If understood broadly, competitive intelligence can be considered as a subset of business intelligence. === Compared with business analytics === Business intelligence and business analytics are sometimes used interchangeably, but there are alternate definitions. Thomas Davenport, professor of information technology and management at Babson College argues that business intelligence should be divided into querying, reporting, Online analytical processing (OLAP), an "alerts" tool, and business analytics. In this definition, business analytics is the subset of BI focusing on statistics, prediction, and optimization, rather than the reporting functionality. == Unstructured data == Business operations can generate a very large amount of data in the form of emails, memos, notes from call centers, news, user groups, chats, reports, web pages, presentations, image files, video files, and marketing material. According to Merrill Lynch, more than 85% of all business information exists in these forms; a company might only use such a document a single time. Because of the way it is produced and stored, this information is either unstructured or semi-structured. The management of semi-structured data is an unsolved problem in the information technology industry. According to projections from Gartner (2003), white-collar workers spend 30–40% of their time searching, finding, and assessing unstructured data. BI uses both structured and unstructured data. The former is easy to search, and the latter contains a large quantity of the information needed for analysis and decision-making. Because of the difficulty of properly searching, finding, and assessing unstructured or semi-structured data, organizations may not draw upon these vast reservoirs of information, which could influence a particular decision, task, or project. This can ultimately lead to poorly informed decision-making. Therefore, when designing a business intelligence/DW solution, the specific problems associated with semi-structured and unstructured data must be accommodated, as well as those associated with structured data. === Limitations of semi-structured and unstructured data === There are several challenges to developing BI with semi-structured data. According to Inmon & Nesavich, some of those are: Physically accessing unstructured textual data – unstructured data is stored in a huge variety of formats. Terminology – Among researchers and analysts, there is a need to develop standardized terminology. Volume of data – As stated earlier, up to 85% of all data exists as semi-structured data. Couple that with the need for word-to-word and semantic analysis. Searchability of unstructured textual data – A simple search on some data, e.g. apple, results in links where there is a reference to that precise search term. (Inmon & Nesavich, 2008) gives an example: "a search is made on the term felony. In a simple search, the term felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made. But a simple search is crude. It does not find references to crime, arson, murder, embezzlement, vehicular homicide, and such, even though these crimes are types of felonies". === Metadata === To solve problems with searchability and assessment of data, it is necessary to know something about the content. This can be done by adding context through the use of metadata. Many systems already capture some metadata (e.g. filename, author, size, etc.), but more usef

    Read more →
  • Ericom Connect

    Ericom Connect

    Ericom Connect is a remote access/application publishing solution produced by Ericom Software that provides secure, centrally managed access to physical or hosted desktops and applications running on Microsoft Windows and Linux systems. == Product overview == Ericom Connect is desktop virtualization and application virtualization software that allows users to run applications remotely, without installing them on the local computer or device. The software is noted for its scalability, ease of deployment, and compatibility with any type of infrastructure, cloud or physical. Ericom Connect uses AccessPad (native client for desktops), AccessToGo (native client for mobile), or AccessNow, one of the first HTML5 RDP solutions to support clientless access to Windows desktops and applications from any device with an HTML5-compatible browser, including Macintosh computers, mobile devices, and Google Chromebooks. Other notable features include performance monitoring, built-in real-time analytics & BI, support for two-factor authentication (using RSA SecurID), multi-tenancy and multi-datacenter support via a single unified web interface, and a “Launch Simulation” feature that allows users to visualize and simulate actual step-by-step user processes directly from within the administration console. In addition to scalability, by distributing configurations, logs, etc., across multiple servers there is no single point of failure, as can be the case if all configuration information is stored on one server. == History == Ericom Connect was introduced in 2015. Ericom Connect is a successor to Ericom PowerTerm Web Connect. PowerTerm Web Connect used an architecture similar to what was then current with Citrix and VMWare, relying on a centralized SQL server, a connection broker, image management for different hypervisors, and a variety of clients. Ericom Connect uses a new grid architecture that provides more scalability, reliability, and flexibility than before.

    Read more →
  • Content format

    Content format

    A content format is an encoded format for converting a specific type of data to displayable information. Content formats are used in recording and transmission to prepare data for observation or interpretation. This includes both analog and digitized content. Content formats may be recorded and read by either natural or manufactured tools and mechanisms. In addition to converting data to information, a content format may include the encryption and/or scrambling of that information. Multiple content formats may be contained within a single section of a storage medium (e.g. track, disk sector, computer file, document, page, column) or transmitted via a single channel (e.g. wire, carrier wave) of a transmission medium. With multimedia, multiple tracks containing multiple content formats are presented simultaneously. Content formats may either be recorded in secondary signal processing methods such as a software container format (e.g. digital audio, digital video) or recorded in the primary format (e.g. spectrogram, pictogram). Observable data is often known as raw data, or raw content. A primary raw content format may be directly observable (e.g. image, sound, motion, smell, sensation) or physical data which only requires hardware to display it, such as a phonographic needle and diaphragm or a projector lamp and magnifying glass. The following are examples of some common content formats and content format categories (covering: sensory experience, model, and language used for encoding information):

    Read more →
  • Commit (data management)

    Commit (data management)

    In computer science and data management, a commit is a behavior that marks the end of a transaction and provides Atomicity, Consistency, Isolation, and Durability (ACID) in transactions. The submission records are stored in the submission log for recovery and consistency in case of failure. In terms of transactions, the opposite of committing is giving up tentative changes to the transaction, which is rolled back. Due to the rise of distributed computing and the need to ensure data consistency across multiple systems, commit protocols have been evolving since their emergence in the 1970s. The main developments include the Two-Phase Commit (2PC) first proposed by Jim Gray, which is the fundamental core of distributed transaction management. Subsequently, the Three-phase Commit (3PC), Hypothesis Commit (PC), Hypothesis Abort (PA), and Optimistic Commit protocols gradually emerged, solving the problems of blocking and fault recovery. Today, new fields such as e-commerce payment and blockchain technology are emerging, and submission protocols play a significant role in various business areas. By effectively handling transactions, resolving faults and recovering problems, the commit protocol becomes crucial in ensuring the reliability and consistency of data management. == History == The concept of Commit originated in the late 1960s and early 1970s, when computer technology was rapidly advancing and data management was becoming an important requirement in business and finance. Enterprises have gradually replaced the traditional paper records with computers, which has fully improved the work efficiency. The reliability and consistency of data have become a necessary requirement. Transaction management at this stage is relatively simple, limited to using a single computer for processing. It merely effectively records the changes in data to ensure that the data remains stable after the transaction is completed or terminated. In the late 1970s, as database systems moved from a single calculator operation to multiple distributed collaborations, ensuring data consistency and reliability became a new challenge. In 1978, computer scientist Jim Gray proposed the famous two-phase Commit Protocol (2PC), which became an effective solution for distributed transaction management, successfully managing data synchronization problems between multiple nodes. However, this commit protocol has some potential transaction blocking problems when nodes fail. In the early 1980s, researchers discovered that although the two-step commit protocol was effective at synchronizing data, there could be long waits and even system crashes, with limitations. To improve this problem, people have begun to explore new and effective methods, including enhancing efficiency by reducing message communication during the protocol process. IBM's R database introduced the Assumed Commit and Assumed abort protocols, which contributed significantly to transaction management efficiency. These two protocols have greatly improved the processing efficiency of distributed transactions by reducing communication overhead and have become an important breakthrough in the technology of transaction commit protocols. By the early 1990s, with the increase in business demands and the complexity of transactions, enterprises required higher efficiency in distributed transaction processing. In order to adapt to the needs of different environments, the scientific community has gradually developed various variants of commit protocols to provide more flexible transaction management options for different needs. For example, the three-phase commit protocol promotes the commit of transactions more effectively and reduces the occurrence of blocking problems by adding a pre-commit protocol and a timeout mechanism. In the 21st century, with the popularization of mobile Internet and wireless technology, the commit protocol has been further developed, and researchers have begun to pay attention to how to reduce the blocking in the transaction process to solve the problem of broadband limitation, battery life and network instability in the mobile environment. The proposal of optimistic commit protocol marks the extension of commit technology from traditional database to the emerging mobile data field. This protocol allows transactions to temporarily use unconfirmed data, improving the user experience in cases of poor network conditions. In recent years, with the rise of blockchain and decentralized technologies, submission protocols and consensus mechanisms have gradually merged. These consensus algorithms play a role in tamper-proofing and preventing malicious attacks on node pairs in a decentralized environment. This enables commit to no longer be confined to the scope of traditional database management, but to become the core technology of trust computing and distributed ledgers, further expanding the application field of commit in the digital age. This integration has brought about extensive application impacts. Each transaction can achieve the effect of tracking global submissions through the verification of the consensus mechanism, becoming an important technical foundation for promoting the circulation of digital assets, the operation of cryptocurrencies and decentralized applications. == Commit Protocol Types == In the world of data management, a transaction is a series of database operations, such as bank transfers and order submission. In order to ensure the accuracy, consistency, and security of the data, transactions are usually completed completely, or cancelled completely, leaving no partially completed results. Commit protocol is the method used to coordinate this process. Different protocols are applicable to different submission scenarios and have their own advantages and disadvantages. There are four major commit protocols. === Two-Phase Commit (2PC) === The two-phase commit protocol is the most classic and broadest approach to distributed transactions, which includes both a preparation phase and a commit phase. This commit protocol is designed to allow the database coordinator to determine if all participating nodes agree. The preparation phase is the phase in which the coordination node sends a ready to commit request to all nodes participating in the transaction. The commit phase is a global commit after all participating nodes are ready, and if no agreement is reached, all nodes roll back the transaction and undo all previous operations. Although the two-phase commit protocol is the easiest to operate and widely used, its obvious drawback is that it can cause transactions to be blocked for a long time when nodes fail, resulting in a decline in system performance and making it difficult to terminate or continue immediately. === Three-Phase Commit (3PC) === The three-phase commit protocol is an improved non-blocking protocol based on 2PC, which is divided into three stages: preparation, pre-commit and commit. Firstly, each node sends a "preparation" request. After confirmation, a "pre-submission" stage is added. At this point, each node has completed most of the preparatory work and is waiting for the final confirmation. Finally, in the formal commit stage, after all nodes send the "commit" request, the transaction is completed and committed. Compared with 2PC, it increases the timeout mechanism, avoids the blocking problem caused by single point of failure, and improves the reliability of the system. The three-phase commit protocol significantly optimizes transaction reliability, but adds additional overhead for message transmission and state maintenance. It is more suitable for distributed application scenarios with high transaction sensitivity and no acceptance of long waiting times. === Presumed Commit (PC) and Presumed Abort (PA) === Presumed Commit (PC) is the default that the transaction will be committed successfully and rollback will be notified unless an anomaly is encountered. This commit reduces the message overhead and logging costs of a normal commits. Presumed Abort (PA) is assumed that the default state of the transaction is a rollback and will only be committed when all nodes have explicitly agreed. This commit is applicable to transactions that are not updated frequently or have a low probability of successful commit. The IBM R Distributed Database management System was the first to propose and practice the PC and PA protocols, handling distributed transaction management very efficiently and becoming a classic case in the field of database transaction management. === Optimistic Commit Protocol === With the rise of the Internet, the previous commit protocols are facing new challenges, especially in mobile scenarios with unstable networks. Excessively long transaction waiting times can affect the user experience. The Optimistic Commit Protocol allows a transaction to temporarily access uncommitted data before committing to avoid wait times. This type of commit is suitable f

    Read more →
  • Data lake

    Data lake

    A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics, and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video). A data lake can be established on premises (within an organization's data centers) or in the cloud (using cloud services). == Background == James Dixon, then chief technology officer at Pentaho, coined the term by 2011 to contrast it with data mart, which is a smaller repository of interesting attributes derived from raw data. In promoting data lakes, he argued that data marts have several inherent problems, such as information siloing. PricewaterhouseCoopers (PwC) said that data lakes could "put an end to data silos". In their study on data lakes, they noted that enterprises were "starting to extract and place data for analytics into a single, Hadoop-based repository." == Examples == Many companies use cloud storage services such as Google Cloud Storage and Amazon S3 or a distributed file system such as Apache Hadoop distributed file system (HDFS). There is a gradual academic interest in the concept of data lakes. For example, Personal DataLake at Cardiff University is a new type of data lake which aims at managing big data of individual users by providing a single point of collecting, organizing, and sharing personal data. Early data lakes, such as Hadoop 1.0, had limited capabilities because it only supported batch-oriented processing (Map Reduce). Interacting with it required expertise in Java, map reduce and higher-level tools like Apache Pig, Apache Spark and Apache Hive (which were also originally batch-oriented). == Criticism == Poorly managed data lakes have been facetiously called data swamps. In June 2015, David Needle characterized "so-called data lakes" as "one of the more controversial ways to manage big data". PwC was also careful to note in their research that not all data lake initiatives are successful. They quote Sean Martin, CTO of Cambridge Semantics: We see customers creating big data graveyards, dumping everything into Hadoop distributed file system (HDFS) and hoping to do something with it down the road. But then they just lose track of what’s there. The main challenge is not creating a data lake, but taking advantage of the opportunities it presents. They describe companies that build successful data lakes as gradually maturing their lake as they figure out which data and metadata are important to the organization. Another criticism is that the term data lake is used with many different meanings. It may be used to refer to, for example: any tools or data management practices that are not data warehouses; a particular technology for implementation; a raw data reservoir; a hub for ETL offload; or a central hub for self-service analytics. While critiques of data lakes are warranted, in many cases they apply to other data projects as well. For example, the definition of data warehouse is also changeable, and not all data warehouse efforts have been successful. In response to various critiques, McKinsey noted that the data lake should be viewed as a service model for delivering business value within the enterprise, not a technology outcome. == Data lakehouses == Data lakehouses are a hybrid approach that can ingest a variety of raw data formats like a data lake, while also providing ACID transactions and enforced data quality like a data warehouse.

    Read more →
  • Superintelligence ban

    Superintelligence ban

    Superintelligence ban refers to proposed legal, ethical, or policy measures intended to restrict or prohibit the development of artificial superintelligence, AI systems that would surpass human cognitive abilities in nearly all domains. The idea arises from concerns that such systems could become uncontrollable, potentially posing existential threats to humanity or causing severe social and economic disruption. == Background == The concept of limiting or banning superintelligence research has roots in early 21st-century debates on artificial general intelligence (AGI) safety. Thinkers such as Nick Bostrom and Eliezer Yudkowsky warned that self-improving AI could rapidly exceed human oversight. As advanced models like large-scale language models and autonomous agents began demonstrating complex reasoning abilities, policymakers and ethicists increasingly discussed the need for legal constraints on the creation of systems capable of recursive self-improvement. In October 2025, the Future of Life Institute published a statement calling for "a prohibition on the development of superintelligence, not lifted before there is broad scientific consensus that it will be done safely and controllably, and strong public buy-in." This statement was signed by various public personalities, such as Richard Branson and Steve Wozniak, and AI experts, such as Yoshua Bengio and Geoffrey Hinton. == Rationale == Supporters of a superintelligence ban argue that once AI systems surpass human intelligence, traditional containment, alignment, and control methods may fail. They contend that even limited experimentation with such systems could lead to irreversible outcomes, including loss of human decision-making power or unintended global harm. Some propose international treaties modeled after the nuclear non-proliferation framework to prevent a competitive AI arms race. Opponents argue that a ban would be difficult to define and enforce, given the lack of a precise threshold distinguishing advanced AGI from superintelligence. They also warn that excessive restriction could slow scientific progress, hinder beneficial automation, and encourage unregulated underground research. == Global discussion == Although no government has enacted an explicit superintelligence ban, the idea has been debated within the European Union, United Nations, and several independent AI safety organizations. The Future of Life Institute, Center for AI Safety, and other organizations have called for international cooperation to manage risks associated with the pursuit of superintelligent systems. In 2024 and 2025, proposals for a temporary moratorium on frontier AI research were circulated among major technology firms and research institutes, reflecting growing public concern over the trajectory of AI capabilities.

    Read more →
  • Knapsack problem

    Knapsack problem

    The knapsack problem is the following problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine which items to include in the collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. The problem often arises in resource allocation where the decision-makers have to choose from a set of non-divisible projects or tasks under a fixed budget or time constraint, respectively. The knapsack problem has been studied for more than a century, with early works dating back to 1897. The subset sum problem is a special case of the decision and 0-1 problems where for each kind of item, the weight equals the value: w i = v i {\displaystyle w_{i}=v_{i}} . In the field of cryptography, the term knapsack problem is often used to refer specifically to the subset sum problem. The subset sum problem is one of Karp's 21 NP-complete problems. == Applications == Knapsack problems appear in real-world decision-making processes in a wide variety of fields, such as finding the least wasteful way to cut raw materials, selection of investments and portfolios, selection of assets for asset-backed securitization, and generating keys for the Merkle–Hellman and other knapsack cryptosystems. One early application of knapsack algorithms was in the construction and scoring of tests in which the test-takers have a choice as to which questions they answer. For small examples, it is a fairly simple process to provide the test-takers with such a choice. For example, if an exam contains 12 questions each worth 10 points, the test-taker need only answer 10 questions to achieve a maximum possible score of 100 points. However, on tests with a heterogeneous distribution of point values, it is more difficult to provide choices. Feuerman and Weiss proposed a system in which students are given a heterogeneous test with a total of 125 possible points. The students are asked to answer all of the questions to the best of their abilities. Of the possible subsets of problems whose total point values add up to 100, a knapsack algorithm would determine which subset gives each student the highest possible score. A 1999 study of the Stony Brook University Algorithm Repository showed that, out of 75 algorithmic problems related to the field of combinatorial algorithms and algorithm engineering, the knapsack problem was the 19th most popular and the third most needed after suffix trees and the bin packing problem. == Definition == The most common problem being solved is the 0-1 knapsack problem, which restricts the number x i {\displaystyle x_{i}} of copies of each kind of item to zero or one. Given a set of n {\displaystyle n} items numbered from 1 up to n {\displaystyle n} , each with a weight w i {\displaystyle w_{i}} and a value v i {\displaystyle v_{i}} , along with a maximum weight capacity W {\displaystyle W} , maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ { 0 , 1 } {\displaystyle x_{i}\in \{0,1\}} . Here x i {\displaystyle x_{i}} represents the number of instances of item i {\displaystyle i} to include in the knapsack. Informally, the problem is to maximize the sum of the values of the items in the knapsack so that the sum of the weights is less than or equal to the knapsack's capacity. The bounded knapsack problem (BKP) removes the restriction that there is only one of each item, but restricts the number x i {\displaystyle x_{i}} of copies of each kind of item to a maximum non-negative integer value c {\displaystyle c} : maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ { 0 , 1 , 2 , … , c } . {\displaystyle x_{i}\in \{0,1,2,\dots ,c\}.} The unbounded knapsack problem (UKP) places no upper bound on the number of copies of each kind of item and can be formulated as above except that the only restriction on x i {\displaystyle x_{i}} is that it is a non-negative integer. maximize ∑ i = 1 n v i x i {\displaystyle \sum _{i=1}^{n}v_{i}x_{i}} subject to ∑ i = 1 n w i x i ≤ W {\displaystyle \sum _{i=1}^{n}w_{i}x_{i}\leq W} and x i ∈ N . {\displaystyle x_{i}\in \mathbb {N} .} One example of the unbounded knapsack problem is given using the figure shown at the beginning of this article and the text "if any number of each book is available" in the caption of that figure. == Computational complexity == The knapsack problem is interesting from the perspective of computer science for many reasons: The decision problem form of the knapsack problem (Can a value of at least V be achieved without exceeding the weight W?) is NP-complete, thus there is no known algorithm that is both correct and fast (polynomial-time) in all cases. There is no known polynomial algorithm which can tell, given a solution, whether it is optimal (which would mean that there is no solution with a larger V). This problem is co-NP-complete. There is a pseudo-polynomial time algorithm using dynamic programming. There is a fully polynomial-time approximation scheme, which uses the pseudo-polynomial time algorithm as a subroutine, described below. Many cases that arise in practice, and "random instances" from some distributions, can nonetheless be solved exactly. There is a link between the "decision" and "optimization" problems in that if there exists a polynomial algorithm that solves the "decision" problem, then one can find the maximum value for the optimization problem in polynomial time by applying this algorithm iteratively while increasing the value of k. On the other hand, if an algorithm finds the optimal value of the optimization problem in polynomial time, then the decision problem can be solved in polynomial time by comparing the value of the solution output by this algorithm with the value of k. Thus, both versions of the problem are of similar difficulty. One theme in research literature is to identify what the "hard" instances of the knapsack problem look like, or viewed another way, to identify what properties of instances in practice might make them more amenable than their worst-case NP-complete behaviour suggests. The goal in finding these "hard" instances is for their use in public-key cryptography systems, such as the Merkle–Hellman knapsack cryptosystem. More generally, better understanding of the structure of the space of instances of an optimization problem helps to advance the study of the particular problem and can improve algorithm selection. Furthermore, notable is the fact that the hardness of the knapsack problem depends on the form of the input. If the weights and profits are given as integers, it is weakly NP-complete, while it is strongly NP-complete if the weights and profits are given as rational numbers. However, in the case of rational weights and profits it still admits a fully polynomial-time approximation scheme. === Unit-cost models === The NP-hardness of the Knapsack problem relates to computational models in which the size of integers matters (such as the Turing machine). In contrast, decision trees count each decision as a single step. Dobkin and Lipton show an 1 2 n 2 {\displaystyle {1 \over 2}n^{2}} lower bound on linear decision trees for the knapsack problem, that is, trees where decision nodes test the sign of affine functions. This was generalized to algebraic decision trees by Steele and Yao. If the elements in the problem are real numbers or rationals, the decision-tree lower bound extends to the real random-access machine model with an instruction set that includes addition, subtraction and multiplication of real numbers, as well as comparison and either division or remaindering ("floor"). This model covers more algorithms than the algebraic decision-tree model, as it encompasses algorithms that use indexing into tables. However, in this model all program steps are counted, not just decisions. An upper bound for a decision-tree model was given by Meyer auf der Heide who showed that for every n there exists an O(n4)-deep linear decision tree that solves the subset-sum problem with n items. Note that this does not imply any upper bound for an algorithm that should solve the problem for any given n. == Solving == Several algorithms are available to solve knapsack problems, based on the dynamic programming approach, the branch and bound approach or hybridizations of both approaches. === Dynamic programming in-advance algorithm === The unbounded knapsack problem (UKP) places no restriction on the number of copies of each kind of item. Besides, here we assume that x i > 0 {\displaystyle x_{i}>0} m [ w ′ ] = max ( ∑ i = 1 n v i x i ) {\displaystyle m[w']=\max \left(\sum _{i=1}^{n}v_{i}x_{i}\right)} subject to ∑

    Read more →
  • Data security

    Data security

    Data security or data protection is the process of securing digital information to protect it from online threats. Data security or protection means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach. Data security protects computer hardware, software, storage devices, and the data of user devices. Data security also protects the data of organizations, companies and administrative controls. Data security guarantees the protection of individual data, such as identity documents and bank data, and protects against unauthorized access, theft and loss of individual data. Data security also protects data breaches that occurs in companies and industries. Good security measures in industries reduce the probability of data breaches, and employees can rely on the company with their data and private information to be kept secured while companies can continue to maintain a stable reputation. The CIA Triad (Confidentiality, Integrity, and Availability) is what is used to practice what an information security is required to follow. Confidentiality, protects information from being accessed by unauthorized persons. Integrity, makes sure data is trustworthy; and Availability, meaning that data can be accessed by approved users when it is needed; are three goals for data security. Non-repudiation in data security definition, is a device/service that shows where the data originated from and the proof of integrity. == Technologies == === Disk encryption === Disk encryption refers to encryption technology that encrypts data on a hard disk drive. It takes data from a storage device and coverts it into an unreadable format. Disk encryption typically takes form in either software (see disk encryption software) or hardware (see disk encryption hardware) which can be used together. Disk encryption is often referred to as on-the-fly encryption (OTFE) or transparent encryption. Full disk encryption encrypts each individual sector of a disk volume. Files and user data are encrypted to hinder unauthorized users from accessing without a decryption key. A diversifier permits a plaintext of a specific disk sector to be encrypted into different ciphertexts, which does not require additional storage, such as an initialization vector (IV) or message authentication code (MAC). === Software versus hardware-based mechanisms for protecting data === Software-based security solutions encrypt the data to protect it from theft. However, a malicious program or a hacker could corrupt the data to make it unrecoverable, making the system unusable. Hardware-based security solutions prevent read and write access to data, which provides very strong protection against tampering and unauthorized access. Hardware-based security or assisted computer security offers an alternative to software-only computer security. Security tokens such as those using PKCS#11 or a mobile phone may be more secure due to the physical access required in order to be compromised. Access is enabled only when the token is connected and the correct PIN is entered (see two-factor authentication). However, dongles can be used by anyone who can gain physical access to it. Newer technologies in hardware-based security solve this problem by offering full proof of security for data. Working off hardware-based security: A hardware device allows a user to log in, log out and set different levels through manual actions. Many devices use biometric technology to prevent malicious users from logging in, logging out, and changing privilege levels. The current state of a user of the device is read by controllers in peripheral devices such as hard disks. Illegal access by a malicious user or a malicious program is interrupted based on the current state of a user by hard disk and DVD controllers making illegal access to data impossible. Hardware-based access control is more secure than the protection provided by the operating systems as operating systems are vulnerable to malicious attacks by viruses and hackers. The data on hard disks can be corrupted after malicious access is obtained. With hardware-based protection, the software cannot manipulate the user privilege levels. A hacker or a malicious program cannot gain access to secure data protected by hardware or perform unauthorized privileged operations. This assumption is broken only if the hardware itself is malicious or contains a backdoor. The hardware protects the operating system image and file system privileges from being tampered with. Therefore, a completely secure system can be created using a combination of hardware-based security and secure system administration policies. === Backups === Backup is the process of reproducing copies of essential data and storing in a separate, secured place. It is used to ensure data that is lost can be recovered from another source. Backups contains a minimum of one copy of the data that requires preservation. It is considered essential to keep a backup of any data in most industries and the process is recommended for any files of importance to a user. There are 3 types of backups; full backups, incremental backups, and differential backups. Full backups secure all data from a production system, such as a server, database, or other connected data source. It is impossible to lose all data in a full backup if a breach or corruption were to occur. Full backups require a significantly large amount of time to back up and may be time-consuming taking hours to days to complete. Incremental backups only secures changed data since last backup. While all backups are done in full backups, incremental backups only save data that is recently or frequently changed. Incremental backups require lower storage costs making it a prominent solution for growing datasets. === Data Privacy === Data privacy (or information privacy) is the right for individual's data to be secured to obstruct the use of unauthorized access. It gives individuals control over their data and how it can be shared to third parties. The U.S Privacy Protection Law (see Privacy laws of the United States) requires organizations to inform individuals of how their data is collected and when a data breach occurs. By implementing an encryption, it ensures that private data is unreadable to cybercriminals. === Data masking === Data masking of structured data is the process of obscuring (masking) specific data within a database table or cell to ensure that data security is maintained and sensitive information is not exposed to unauthorized personnel. This may include masking the data from users (for example so banking customer representatives can only see the last four digits of a customer's national identity number), developers (who need real production data to test new software releases but should not be able to see sensitive financial data), outsourcing vendors, etc. Data masking is a form of encryption, as it obscures data by modifying particular letters and numbers to keep data concealed and protected from potential hackers. The individual that has access to the code that decrypts the replaced characters are the only ones that can uncover the data. === Data erasure === Data erasure (or data deletion, data destruction) is a method of software-based overwriting that permanently clears all electronic data residing on a hard drive or other digital media to ensure that no sensitive data is lost when an asset is retired or reused. Article 17: Right to be Forgotten states that users have the right to permanently remove all of their private information from their old devices/services to give people more control over their data. Users are able to switch between devices efficiently. == Threats == === Malware === Malware (or malicious software) is designed to destroy, corrupt or gain unauthorized access to a computer for the purpose of stealing, or destroying data. Hackers who use malware typically utilize many types of malware, which includes computer virus, computer worms, ransomware, spyware and Trojan horse to create a vast system of disruption and cause easy data theft. One of the victims of the vast system of disruption includes healthcare workers, who are targeted by compromised systems by infections and then having their data attacked. === Phishing === Phishing is a type of scam that allows hackers to hoax people using psychological and social engineering (using human emotions such as their trust and fear) tactics into giving personal data through emails and messages, and install computer viruses if the individual were to click on a malicious link unknowingly. Attackers are able to create websites that are very similar to original websites, which makes it difficult to detect a fake website, causing individuals to fall for giving in information. Phishing attackers use human emotion to exploit them, such as making them feel fear, urgency, sympathy with the message

    Read more →
  • Protecting Our Kids from Social Media Addiction Act

    Protecting Our Kids from Social Media Addiction Act

    Protecting Our Kids from Social Media Addiction Act also known as California SB 976 is a law that was enacted in September 2024 that is meant to address problematic social media usage among minors. The law prohibitions minors to have "addictive feeds" unless they have verifiable parental consent, minor's notifications are also restricted between 12 am to 6 am and during school hours between 8 am and 3 pm it also well requires minors to have default privacies settings and have social media companies to publicly disclose certain metrics about their users. The law was set to take effect in two steps the first being the restrictions on social media feeds, notifications, disclosures from social media companies and default settings which would have taken effect on January 1, 2025, and the age verification provision which would have taken effect on January 1, 2027. However, has faced legal challenges since its enactment delaying its enactment. == Legal Challenges == In November 2024 NetChoice a trade association representing many of the biggest social media companies such as YouTube, Facebook and Instagram sued the attorney general of California Rob Bonta hoping to get an injunction before the first set of the law's provisions would take effect in January of the next year. However, judge Edward Davila would only grant Netchoice's request as to the restrictions on notifications and public disclosures and would deny their request as to the rest of the law. The law was later fully enjoined temporarily by the District Court and Appellant Court pending appeal, and the case is now in the Ninth Circuit Court of Appeals and is pending a decision. === Social media platforms challenges to law === In November 2025 Meta, Google and TikTok filed lawsuits against the law arguing it violates the first amendment.

    Read more →
  • Relational data mining

    Relational data mining

    Relational data mining is the data mining technique for relational databases. Unlike traditional data mining algorithms, which look for patterns in a single table (propositional patterns), relational data mining algorithms look for patterns among multiple tables (relational patterns). For most types of propositional patterns, there are corresponding relational patterns. For example, there are relational classification rules (relational classification), relational regression tree, and relational association rules. There are several approaches to relational data mining: Inductive Logic Programming (ILP) Statistical Relational Learning (SRL) Graph Mining Propositionalization Multi-view learning == Algorithms == Multi-Relation Association Rules: Multi-Relation Association Rules (MRAR) is a new class of association rules which in contrast to primitive, simple and even multi-relational association rules (that are usually extracted from multi-relational databases), each rule item consists of one entity but several relations. These relations indicate indirect relationship between the entities. Consider the following MRAR where the first item consists of three relations live in, nearby and humid: “Those who live in a place which is near by a city with humid climate type and also are younger than 20 -> their health condition is good”. Such association rules are extractable from RDBMS data or semantic web data. == Software == Safarii: a Data Mining environment for analysing large relational databases based on a multi-relational data mining engine. Dataconda: a software, free for research and teaching purposes, that helps mining relational databases without the use of SQL. == Datasets == Relational dataset repository: a collection of publicly available relational datasets.

    Read more →
  • Atomicity (database systems)

    Atomicity (database systems)

    In database systems, atomicity (; from Ancient Greek: ἄτομος, romanized: átomos, lit. 'undividable') is the property of a database transaction consisting of an indivisible and irreducible series of database operations such that either all occur, or none occur. It is one of the ACID transaction properties: Atomicity, Consistency, Isolation, Durability. A guarantee of atomicity prevents partial database updates from occurring, because they can cause greater problems than rejecting the whole series outright. As a consequence, an atomic transaction cannot be observed to be in progress by another database client: at one moment in time, it has not yet happened, and at the next it has already occurred in whole (or nothing happened if the transaction was cancelled in progress). An example of transaction atomicity could be a digital monetary transfer from bank account A to account B. It consists of two operations, debiting the money from account A and crediting it to account B. Performing both of these operations inside of an atomic transaction ensures that the database remains in a consistent state, if either operation fails there will not be any unaccountable credits or debits affecting either account. The same term is also used in the definition of First normal form in database systems, where it instead refers to the concept that the values for fields may not consist of multiple smaller values to be decomposed, such as a string into which multiple names, numbers, dates, or other types may be packed. == Orthogonality == Atomicity does not behave completely orthogonally with regard to the other ACID properties of transactions. For example, isolation relies on atomicity to roll back the enclosing transaction in the event of an isolation violation such as a deadlock; consistency also relies on atomicity to roll back the enclosing transaction in the event of a consistency violation by an illegal transaction. As a result of this, a failure to detect a violation and roll back the enclosing transaction may cause an isolation or consistency failure. == Implementation == Typically, systems implement Atomicity by providing some mechanism to indicate which transactions have started and which finished; or by keeping a copy of the data before any changes occurred (Read-copy-update). Several filesystems have developed methods for avoiding the need to keep multiple copies of data, using journaling (see journaling file system). Databases usually implement this using some form of logging/journaling to track changes. The system synchronizes the logs (often the metadata) as necessary after changes have successfully taken place. Afterwards, crash recovery ignores incomplete entries. Although implementations vary depending on factors such as concurrency issues, the principle of atomicity – i.e. complete success or complete failure – remain. Ultimately, any application-level implementation relies on operating-system functionality. At the file-system level, POSIX-compliant systems provide system calls such as open(2) and flock(2) that allow applications to atomically open or lock a file. At the process level, POSIX Threads provide adequate synchronization primitives. The hardware level requires atomic operations such as Test-and-set, Fetch-and-add, Compare-and-swap, or Load-Link/Store-Conditional, together with memory barriers. Portable operating systems cannot simply block interrupts to implement synchronization, since hardware that lacks concurrent execution such as hyper-threading or multi-processing is now extremely rare. In distributed and sharded databases, atomicity is complicated by network latency and the potential for partial failures. While traditional distributed systems often employ locking protocols (like 2PC) to ensure cross-shard atomicity, these can introduce performance bottlenecks. Recent research into distributed ledger consensus suggests alternative models, such as "braided synchronization". This technique, utilized in protocols like Cerberus, intertwines the consensus phases of multiple shards to enforce atomic guarantees without a global ordering of all transactions.

    Read more →
  • Data Transformation Services

    Data Transformation Services

    Data Transformation Services (DTS) is a Microsoft database tool with a set of objects and utilities to allow the automation of extract, transform and load operations to or from a database. The objects are DTS packages and their components, and the utilities are called DTS tools. DTS was included with earlier versions of Microsoft SQL Server, and was almost always used with SQL Server databases, although it could be used independently with other databases. DTS allows data to be transformed and loaded from heterogeneous sources using OLE DB, ODBC, or text-only files, into any supported database. DTS can also allow automation of data import or transformation on a scheduled basis, and can perform additional functions such as FTPing files and executing external programs. In addition, DTS provides an alternative method of version control and backup for packages when used in conjunction with a version control system, such as Microsoft Visual SourceSafe. DTS has been superseded by SQL Server Integration Services in later releases of Microsoft SQL Server though there was some backwards compatibility and ability to run DTS packages in the new SSIS for a time. == History == In SQL Server versions 6.5 and earlier, database administrators (DBAs) used SQL Server Transfer Manager and Bulk Copy Program, included with SQL Server, to transfer data. These tools had significant shortcomings, and many DBAs used third-party tools such as Pervasive Data Integrator to transfer data more flexibly and easily. With the release of SQL Server 7 in 1998, "Data Transformation Services" was packaged with it to replace all these tools. The concept, design, and implementation of the Data Transformation Services was led by Stewart P. MacLeod (SQL Server Development Group Program Manager), Vij Rajarajan (SQL Server Lead Developer), and Ted Hart (SQL Server Lead Developer). The goal was to make it easier to import, export, and transform heterogeneous data and simplify the creation of data warehouses from operational data sources. SQL Server 2000 expanded DTS functionality in several ways. It introduced new types of tasks, including the ability to FTP files, move databases or database components, and add messages into Microsoft Message Queue. DTS packages can be saved as a Visual Basic file in SQL Server 2000, and this can be expanded to save into any COM-compliant language. Microsoft also integrated packages into Windows 2000 security and made DTS tools more user-friendly; tasks can accept input and output parameters. DTS comes with all editions of SQL Server 7 and 2000, but was superseded by SQL Server Integration Services in the Microsoft SQL Server 2005 release in 2005. == DTS packages == The DTS package is the fundamental logical component of DTS; every DTS object is a child component of the package. Packages are used whenever one modifies data using DTS. All the metadata about the data transformation is contained within the package. Packages can be saved directly in a SQL Server, or can be saved in the Microsoft Repository or in COM files. SQL Server 2000 also allows a programmer to save packages in a Visual Basic or other language file (when stored to a VB file, the package is actually scripted—that is, a VB script is executed to dynamically create the package objects and its component objects). A package can contain any number of connection objects, but does not have to contain any. These allow the package to read data from any OLE DB-compliant data source, and can be expanded to handle other sorts of data. The functionality of a package is organized into tasks and steps. A DTS Task is a discrete set of functionalities executed as a single step in a DTS package. Each task defines a work item to be performed as part of the data movement and data transformation process or as a job to be executed. Data Transformation Services supplies a number of tasks that are part of the DTS object model and that can be accessed graphically through the DTS Designer or accessed programmatically. These tasks, which can be configured individually, cover a wide variety of data copying, data transformation and notification situations. For example, the following types of tasks represent some actions that you can perform by using DTS: executing a single SQL statement, sending an email, and transferring a file with FTP. A step within a DTS package describes the order in which tasks are run and the precedence constraints that describe what to do in the case damage or of failure. These steps can be executed sequentially or in parallel. Packages can also contain global variables which can be used throughout the package. SQL Server 2000 allows input and output parameters for tasks, greatly expanding the usefulness of global variables. DTS packages can be edited, password protected, scheduled for execution, and retrieved by version. == DTS tools == DTS tools packaged with SQL Server include the DTS wizards, DTS Designer, and DTS Programming Interfaces. === DTS wizards === The DTS wizards can be used to perform simple or common DTS tasks. These include the Import/Export Wizard and the Copy of Database Wizard. They provide the simplest method of copying data between OLE DB data sources. There is a great deal of functionality that is not available by merely using a wizard. However, a package created with a wizard can be saved and later altered with one of the other DTS tools. A Create Publishing Wizard is also available to schedule packages to run at certain times. This only works if SQL Server Agent is running; otherwise the package will be scheduled, but will not be executed. === DTS Designer === The DTS Designer is a graphical tool used to build complex DTS Packages with workflows and event-driven logic. DTS Designer can also be used to edit and customize DTS Packages created with the DTS wizard. Each connection and task in DTS Designer is shown with a specific icon. These icons are joined with precedence constraints, which specify the order and requirements for tasks to be run. One task may run, for instance, only if another task succeeds (or fails). Other tasks may run concurrently. The DTS Designer has been criticized for having unusual quirks and limitations, such as the inability to visually copy and paste multiple tasks at one time. Many of these shortcomings have been overcome in SQL Server Integration Services, DTS's successor. === DTS Query Designer === A graphical tool used to build queries in DTS. === DTS Run Utility === DTS Packages can be run from the command line using the DTSRUN Utility. The utility is invoked using the following syntax: dtsrun /S server_name[\instance_name] { {/[~]U user_name [/[~]P password]} | /E } ] { {/[~]N package_name } | {/[~]G package_guid_string} | {/[~]V package_version_guid_string} } [/[~]M package_password] [/[~]F filename] [/[~]R repository_database_name] [/A global_variable_name:typeid=value] [/L log_file_name] [/W NT_event_log_completion_status] [/Z] [/!X] [/!D] [/!Y] [/!C] ] When passing in parameters which are mapped to Global Variables, you are required to include the typeid. This is rather difficult to find on the Microsoft site. Below are the TypeIds used in passing in these values.

    Read more →