AI Data Operations

AI Data Operations — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Transderivational search

    Transderivational search

    Transderivational search (often abbreviated to TDS) is a psychological and cybernetics term, meaning when a search is being conducted for a fuzzy match across a broad field. In computing the equivalent function can be performed using content-addressable memory. Unlike usual searches, which look for literal (i.e. exact, logical, or regular expression) matches, a transderivational search is a search for a possible meaning or possible match as part of communication, and without which an incoming communication cannot be made any sense of whatsoever. It is thus an integral part of processing language, and of attaching meaning to communication. In NLP (Neuro-linguistic programming), a transderivational search (Bandler and Grinder, 1976) is essentially the process of searching back through one's stored memories and mental representations to find the personal reference experiences from which a current understanding or mental map has been derived. By the end of 1976, Grinder and Bandler had combined Satir’s and Perls’ language patterns and Erickson’s hypnotic language and use of metaphor with anchoring to create new processes that they called collapsing anchors, trans-derivational search, changing personal history, and reframing. A psychological example of TDS is in Ericksonian hypnotherapy, where vague suggestions are used that the patient must process intensely in order to find their own meanings, thus ensuring that the practitioner does not intrude his own beliefs into the subject's inner world. == TDS in human communication and processing == Because TDS is a compelling, automatic and unconscious state of internal focus and processing (i.e. a type of everyday trance state), and often a state of internal lack of certainty, or openness to finding an answer (since something is being checked out at that moment), it can be utilized or interrupted, in order to create, or deepen, trance. TDS is a fundamental part of human language and cognitive processing. Arguably, every word or utterance a person hears, for example, and everything they see or feel and take note of, results in a very brief trance while TDS is carried out to establish a contextual meaning for it. === Examples === Leading statements: "And those thoughts you had yesterday..." the human mind cannot process hearing this phrase, without at some level searching internally for some thoughts or other that it had yesterday, to make the subject of the sentence. "The many colors that fruit can be" likewise starts the human mind considering even if briefly, different fruit sorted by color. "You did it again, didn't you!" This everyday manipulative use of TDS usually sends the recipient looking internally for some "it" they may have done for which blame is being fairly given. Regardless of whether such a matter can be identified, guilt or anger may result. "There has been pain, hasn't there" the mind of a patient suffering an illness will find it very hard or impossible to hear or answer this sentence without conducting internal searches to verify whether this is true or not, or to find an example if so. "You'd forgotten something [or: some part of your body], hadn't you?" the mind usually checks through the various things, or parts of the body, on hearing this, seeing if each in turn has been forgotten. Textual ambiguity: "Do you remember line dancing on the steps?" Without sufficient context, some statements may trigger TDS in order to resolve inherent ambiguity in the interpretation of a posed question. Do I remember a bygone fad called "line dancing on the steps"? Do I remember personally engaging in dancing in the past? Do I remember my routine practice dancing by focusing on the steps of the dance? Do I tend to forget about dancing when I am standing on steps? "Penny-wise and pound the table dance to the beat of a different drummer". The mixing of cliché and stock phrases may trigger TDS in order to reconcile the discrepancies between expected and actual utterances in sequence. Although TDS is often associated with spoken language, it can be induced in any perceptual system. Thus Milton Erickson's "hypnotic handshake" is a technique that leaves the other person performing TDS in search of meaning to a deliberately ambiguous use of touch.

    Read more →
  • Open-source software security

    Open-source software security

    Open-source software security is the measure of assurance or guarantee in the freedom from danger and risk inherent to an open-source software system. == Implementation debate == === Benefits === Proprietary software forces the user to accept the level of security that the software vendor is willing to deliver and to accept the rate that patches and updates are released. It is assumed that any compiler that is used creates code that can be trusted, but it has been demonstrated by Ken Thompson that a compiler can be subverted using a compiler backdoor to create faulty executables that are unwittingly produced by a well-intentioned developer. With access to the source code for the compiler, the developer has at least the ability to discover if there is any mal-intention. Kerckhoffs' principle is based on the idea that an enemy can steal a secure military system and not be able to compromise the information. His ideas were the basis for many modern security practices, and followed that security through obscurity is a bad practice. === Drawbacks === Simply making source code available does not guarantee review. An example of this occurring is when Marcus Ranum, an expert on security system design and implementation, released his first public firewall toolkit. At one time, there were over 2,000 sites using his toolkit, but only 10 people gave him any feedback or patches. Having a large amount of eyes reviewing code can "lull a user into a false sense of security". Having many users look at source code does not guarantee that security flaws will be found and fixed. == Metrics and models == There are a variety of models and metrics to measure the security of a system. These are a few methods that can be used to measure the security of software systems. === Number of days between vulnerabilities === It is argued that a system is most vulnerable after a potential vulnerability is discovered, but before a patch is created. By measuring the number of days between the vulnerability and when the vulnerability is fixed, a basis can be determined on the security of the system. There are a few caveats to such an approach: not every vulnerability is equally bad, and fixing a lot of bugs quickly might not be better than only finding a few and taking a little bit longer to fix them, taking into account the operating system, or the effectiveness of the fix. === Poisson process === The Poisson process can be used to measure the rates at which different people find security flaws between open and closed source software. The process can be broken down by the number of volunteers Nv and paid reviewers Np. The rates at which volunteers find a flaw is measured by λv and the rate that paid reviewers find a flaw is measured by λp. The expected time that a volunteer group is expected to find a flaw is 1/(Nv λv) and the expected time that a paid group is expected to find a flaw is 1/(Np λp). === Morningstar model === By comparing a large variety of open source and closed source projects a star system could be used to analyze the security of the project similar to how Morningstar, Inc. rates mutual funds. With a large enough data set, statistics could be used to measure the overall effectiveness of one group over the other. An example of such as system is as follows: 1 Star: Many security vulnerabilities. 2 Stars: Reliability issues. 3 Stars: Follows best security practices. 4 Stars: Documented secure development process. 5 Stars: Passed independent security review. === Coverity scan === Coverity in collaboration with Stanford University has established a new baseline for open-source quality and security. The development is being completed through a contract with the Department of Homeland Security. They are utilizing innovations in automated defect detection to identify critical types of bugs found in software. The level of quality and security is measured in rungs. Rungs do not have a definitive meaning, and can change as Coverity releases new tools. Rungs are based on the progress of fixing issues found by the Coverity Analysis results and the degree of collaboration with Coverity. They start with Rung 0 and currently go up to Rung 2. Rung 0 The project has been analyzed by Coverity's Scan infrastructure, but no representatives from the open-source software have come forward for the results. Rung 1 At rung 1, there is collaboration between Coverity and the development team. The software is analyzed with a subset of the scanning features to prevent the development team from being overwhelmed. Rung 2 There are 11 projects that have been analyzed and upgraded to the status of Rung 2 by reaching zero defects in the first year of the scan. These projects include: AMANDA, ntp, OpenPAM, OpenVPN, Overdose, Perl, PHP, Postfix, Python, Samba, and Tcl.

    Read more →
  • Client honeypot

    Client honeypot

    Honeypots are security devices whose value lie in being probed and compromised. Traditional honeypots are servers (or devices that expose server services) that wait passively to be attacked. Client Honeypots are active security devices in search of malicious servers that attack clients. The client honeypot poses as a client and interacts with the server to examine whether an attack has occurred. Often the focus of client honeypots is on web browsers, but any client that interacts with servers can be part of a client honeypot (for example ftp, email, ssh, etc.). There are several terms that are used to describe client honeypots. Besides client honeypot, which is the generic classification, honeyclient is the other term that is generally used and accepted. However, there is a subtlety here, as "honeyclient" is actually a homograph that could also refer to the first known open source client honeypot implementation (see below), although this should be clear from the context. == Architecture == A client honeypot is composed of three components. The first component, a queuer, is responsible for creating a list of servers for the client to visit. This list can be created, for example, through crawling. The second component is the client itself, which is able to make a requests to servers identified by the queuer. After the interaction with the server has taken place, the third component, an analysis engine, is responsible for determining whether an attack has taken place on the client honeypot. In addition to these components, client honeypots are usually equipped with some sort of containment strategy to prevent successful attacks from spreading beyond the client honeypot. This is usually achieved through the use of firewalls and virtual machine sandboxes. Analogous to traditional server honeypots, client honeypots are mainly classified by their interaction level: high or low; which denotes the level of functional interaction the server can utilize on the client honeypot. In addition to this there are also newly hybrid approaches which denotes the usage of both high and low interaction detection techniques. == High interaction == High interaction client honeypots are fully functional systems comparable to real systems with real clients. As such, no functional limitations (besides the containment strategy) exist on high interaction client honeypots. Attacks on high interaction client honeypots are detected via inspection of the state of the system after a server has been interacted with. The detection of changes to the client honeypot may indicate the occurrence of an attack against that has exploited a vulnerability of the client. An example of such a change is the presence of a new or altered file. High interaction client honeypots are very effective at detecting unknown attacks on clients. However, the tradeoff for this accuracy is a performance hit from the amount of system state that has to be monitored to make an attack assessment. Also, this detection mechanism is prone to various forms of evasion by the exploit. For example, an attack could delay the exploit from immediately triggering (time bombs) or could trigger upon a particular set of conditions or actions (logic bombs). Since no immediate, detectable state change occurred, the client honeypot is likely to incorrectly classify the server as safe even though it did successfully perform its attack on the client. Finally, if the client honeypots are running in virtual machines, then an exploit may try to detect the presence of the virtual environment and cease from triggering or behave differently. === Capture-HPC === Capture [1] is a high interaction client honeypot developed by researchers at Victoria University of Wellington, NZ. Capture differs from existing client honeypots in various ways. First, it is designed to be fast. State changes are being detected using an event based model allowing to react to state changes as they occur. Second, Capture is designed to be scalable. A central Capture server is able to control numerous clients across a network. Third, Capture is supposed to be a framework that allows to utilize different clients. The initial version of Capture supports Internet Explorer, but the current version supports all major browsers (Internet Explorer, Firefox, Opera, Safari) as well as other HTTP aware client applications, such as office applications and media players. === HoneyClient === HoneyClient [2] is a web browser based (IE/FireFox) high interaction client honeypot designed by Kathy Wang in 2004 and subsequently developed at MITRE. It was the first open source client honeypot and is a mix of Perl, C++, and Ruby. HoneyClient is state-based and detects attacks on Windows clients by monitoring files, process events, and registry entries. It has integrated the Capture-HPC real-time integrity checker to perform this detection. HoneyClient also contains a crawler, so it can be seeded with a list of initial URLs from which to start and can then continue to traverse web sites in search of client-side malware. === HoneyMonkey (dead since 2010) === HoneyMonkey [3] is a web browser based (IE) high interaction client honeypot implemented by Microsoft in 2005. It is not available for download. HoneyMonkey is state based and detects attacks on clients by monitoring files, registry, and processes. A unique characteristic of HoneyMonkey is its layered approach to interacting with servers in order to identify zero-day exploits. HoneyMonkey initially crawls the web with a vulnerable configuration. Once an attack has been identified, the server is reexamined with a fully patched configuration. If the attack is still detected, one can conclude that the attack utilizes an exploit for which no patch has been publicly released yet and therefore is quite dangerous. === SHELIA (dead since 2009) === Shelia [4] is a high interaction client honeypot developed by Joan Robert Rocaspana at Vrije Universiteit Amsterdam. It integrates with an email reader and processes each email it receives (URLs & attachments). Depending on the type of URL or attachment received, it opens a different client application (e.g. browser, office application, etc.) It monitors whether executable instructions are executed in data area of memory (which would indicate a buffer overflow exploit has been triggered). With such an approach, SHELIA is not only able to detect exploits, but is able to actually ward off exploits from triggering. === UW Spycrawler === The Spycrawler [5] developed at the University of Washington is yet another browser based (Mozilla) high interaction client honeypot developed by Moshchuk et al. in 2005. This client honeypot is not available for download. The Spycrawler is state based and detects attacks on clients by monitoring files, processes, registry, and browser crashes. Spycrawlers detection mechanism is event based. Further, it increases the passage of time of the virtual machine the Spycrawler is operating in to overcome (or rather reduce the impact of) time bombs. === Web Exploit Finder === WEF [6] is an implementation of an automatic drive-by-download – detection in a virtualized environment, developed by Thomas Müller, Benjamin Mack and Mehmet Arziman, three students from the Hochschule der Medien (HdM), Stuttgart during the summer term in 2006. WEF can be used as an active HoneyNet with a complete virtualization architecture underneath for rollbacks of compromised virtualized machines. == Low interaction == Low interaction client honeypots differ from high interaction client honeypots in that they do not utilize an entire real system, but rather use lightweight or simulated clients to interact with the server. (in the browser world, they are similar to web crawlers). Responses from servers are examined directly to assess whether an attack has taken place. This could be done, for example, by examining the response for the presence of malicious strings. Low interaction client honeypots are easier to deploy and operate than high interaction client honeypots and also perform better. However, they are likely to have a lower detection rate since attacks have to be known to the client honeypot in order for it to detect them; new attacks are likely to go unnoticed. They also suffer from the problem of evasion by exploits, which may be exacerbated due to their simplicity, thus making it easier for an exploit to detect the presence of the client honeypot. === HoneyC === HoneyC [7] is a low interaction client honeypot developed at Victoria University of Wellington by Christian Seifert in 2006. HoneyC is a platform independent open source framework written in Ruby. It currently concentrates driving a web browser simulator to interact with servers. Malicious servers are detected by statically examining the web server's response for malicious strings through the usage of Snort signatures. === Monkey-Spider (dead since 2008) === Monkey-Spider [8] is a low-interaction client honeypot i

    Read more →
  • Geofence warrant

    Geofence warrant

    A geofence warrant or a reverse location warrant is a search warrant issued by a court to allow law enforcement to search a database to find all active mobile devices within a particular geo-fence area. Courts have granted law enforcement geo-fence warrants to obtain information from databases such as Google's Sensorvault, which collects users' historical geolocation data. Geo-fence warrants are a part of a category of warrants known as reverse search warrants. == History == Geofence warrants were first used in 2016. Google reported that it had received 982 such warrants in 2018, 8,396 in 2019, and 11,554 in 2020. A 2021 transparency report showed that 25% of data requests from law enforcement to Google were geo-fence data requests. Google is the most common recipient of geo-fence warrants and the main provider of such data, although companies including Apple, Snapchat, Lyft, and Uber have also received such warrants. == Legality == === United States === Some lawyers and privacy experts believe reverse search warrants are unconstitutional under the Fourth Amendment to the United States Constitution, which protects people from unreasonable searches and seizures, and requires any search warrants be specific to what and to whom they apply. The Fourth Amendment specifies that warrants may only be issued "upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized." Some lawyers, legal scholars, and privacy experts have likened reverse search warrants to general warrants, which were made illegal by the Fourth Amendment. Groups including the Electronic Frontier Foundation have opposed geo-fence warrants in amicus briefs filed in motions to quash such orders to disclose geo-fence data. In 2024, a panel of the United States Fourth Circuit Court of Appeals considered data acquired from Google’s Sensorvault not to be a search, but non-private business records when users opt-in to Google’s location history. However, upon a rehearing en banc, the Court vacated that decision. In April 2025, the full Court affirmed the judgment solely on the 'good faith' exception, leaving the underlying constitutional question of whether geofence warrants constitute a search unsettled in the Circuit. However, the United States Fifth Circuit Court of Appeals found that geofence warrants are "categorically prohibited by the Fourth Amendment." The split in Circuits prompted the United States Supreme Court to agree to hear Chatrie v. United States in January 2026.

    Read more →
  • The Future of Work and Death

    The Future of Work and Death

    The Future of Work and Death is a 2016 documentary by Sean Blacknell and Wayne Walsh about the exponential growth of technology. The film showed at several film festivals including Raindance Film Festival, International Film Festival Rotterdam, Academia Film Olomouc and CPH:DOX. In May 2017 it received an official screening at the European Commission. It was distributed by First Run Features and Journeyman Pictures and was released on iTunes, Amazon Prime and On-demand on 9 May 2017. The film was made available on Sundance Now on 27 November 2017. A companion piece to the film, The Cost of Living, a documentary concerning universal basic income in Britain, was released on Amazon Prime on 8 October 2020. == Synopsis == World experts in the fields of futurology, anthropology, neuroscience, and philosophy consider the impact of technological advances on the two 'certainties' of human life; work and death. Charting human developments from Homo habilis, past the Industrial Revolution, to the digital age and beyond, the film looks at the shocking exponential rate at which mankind has managed to create technologies to ease the process of living. As we embark on the next phase of our adaptation, with automation and artificial intelligence signifying the complete move from man to machine, the film asks what the implications are for human fulfilment in an approaching era of job obsolescence and extreme longevity. == Cast == Dudley Sutton – Narrator Aubrey de Grey – Biomedical gerontologist and CSO of the SENS Research Foundation Will Self – Writer, journalist, political commentator and Professor of Contemporary Thought at Brunel University Rudolph E. Tanzi – Professor of Neurology at Harvard University and Director of the Genetics and Aging Research Unit at Massachusetts General Hospital (MGH) Martin Ford – Futurist and author Steve Fuller – Auguste Comte Chair in Social Epistemology at the Department of sociology at University of Warwick Murray Shanahan – Professor of Cognitive Robotics at Imperial College London Gray Scott – Futurist, executive producer of this production Vivek Wadhwa – Entrepreneur, academic and Director of Research at the Center for Entrepreneurship and Research Commercialization at the Pratt School of Engineering, Duke University Zoltan Istvan – Transhumanist and journalist Joanna Cook – Anthropologist, University College London Nicholas Kamara – Physician, Kable Hospital David Pearce – Transhumanist philosopher and co-founder of Humanity+ Peter Cochrane – Futurist and entrepreneur John Harris – Bioethicist, philosopher and Director of the Institute for Science, Ethics and Innovation at the University of Manchester Riva Melissa-Tez – Entrepreneur and transhumanist Ian Pearson – Futurologist Stuart Armstrong – Artificial intelligence researcher at Future of Humanity Institute

    Read more →
  • Screenless video

    Screenless video

    Screenless video is any system for transmitting visual information from a video source without the use of a screen. Screenless computing systems can be divided into three groups: Visual Image, Retinal Direct, and Synaptic Interface. == Visual image == Visual Image screenless display includes any image that the eye can perceive. The most common example of Visual Image screenless display is a hologram. In these cases, light is reflected off some intermediate object (hologram, LCD panel, or cockpit window) before it reaches the retina. In the case of LCD panels the light is refracted from the back of the panel, but is nonetheless a reflected source. Google has proposed a similar system to replace the screens of tablet computers and smartphones. == Retinal display == Virtual retinal display systems are a class of screenless displays in which images are projected directly onto the retina. They are distinguished from visual image systems because light is not reflected from some intermediate object onto the retina, it is instead projected directly onto the retina. Retinal Direct systems, once marketed, hold out the promise of extreme privacy when computing work is done in public places because most snooping relies on viewing the same light as the person who is legitimately viewing the screen, and retinal direct systems send light only into the pupils of their intended viewer. == Synaptic interface == Synaptic Interface screenless video does not use light at all. Visual information completely bypasses the eye and is transmitted directly to the brain. While such systems have only been implemented in humans in rudimentary form - for example, displaying single Braille characters to blind people – success has been achieved in sampling usable video signals from the biological eyes of a living horseshoe crab through their optic nerves, and in sending video signals from electronic cameras into the creatures' brains using the same method.

    Read more →
  • NRD Cyber Security

    NRD Cyber Security

    NRD Cyber Security is a Lithuanian company that provides cybersecurity solutions, consulting, and other services. The organization specializes in CSIRT and SOC creation, modernization and training. It has helped to establish national and sectorial CSIRTs around the world, including countries, such as Bangladesh, Egypt, Bhutan, Kosovo, Malawi and others. NRD Cyber Security was found in 2013 to provide quality cybersecurity services to nations and organizations. In 2018 it was included in The Deloitte Technology Fast 50 in Europe list. In 2024 it was awarded the #98 place in MSSP Alert Top 250 world's managed security service providers. The company is a member of various cybersecurity organizations, such as Forum of Incident Response and Security Teams (FIRST), The Global Forum on Cyber Expertise (GFCE), Unicrons Lt. It is a strategic partner of The Global Cyber Security Capacity Centre (GCSCC) at University of Oxford.

    Read more →
  • Swap chain

    Swap chain

    In computer graphics, a swap chain (also swapchain) is a series of virtual framebuffers used by the graphics card and graphics API for frame rate stabilization, stutter reduction, and several other purposes. Because of these benefits, many graphics APIs require the use of a swap chain. The swap chain usually exists in graphics memory, but it can exist in system memory as well. A swap chain with two buffers is a kind of double buffer. == Function == In every swap chain there are at least two buffers. The first framebuffer, the screenbuffer, is the buffer that is rendered to the output of the video card. The remaining buffers are known as backbuffers. Each time a new frame is displayed, the first backbuffer in the swap chain takes the place of the screenbuffer, this is called presentation or swapping. A variety of other actions may be taken on the previous screenbuffer and other backbuffers (if they exist). The screenbuffer may be simply overwritten or returned to the back of the swap chain for further processing. The action taken is decided by the client application and is API dependent. == Direct3D == Microsoft Direct3D implements a SwapChain class. Each host device has at least one swap chain assigned to it, and others may be created by the client application. The API provides three methods of swapping: copy, discard, and flip. When the SwapChain is set to flip, the screenbuffer is copied onto the last backbuffer, then all the existing backbuffers are copied forward in the chain. When copy is set, each backbuffer is copied forward, but the screenbuffer is not wrapped to the last buffer, leaving it unchanged. Flip does not work when there is only one backbuffer, as the screenbuffer is copied over the only backbuffer before it can be presented. In discard mode, the driver selects the best method. == Comparison with triple buffering == Outside the context of Direct3D, triple buffering refers to the technique of allowing an application to draw to whichever back buffer was least recently updated. This allows the application to always proceed with rendering, regardless of the pace at which frames are being drawn by the application or the pace at which frames are being sent to the display. Triple buffering may result in a frame being discarded without being displayed if two or more newer frames are completely rendered in the time it takes for one frame to be sent to the display. By contrast, Direct3D swap chains are a strict first-in, first-out queue, so every frame that is drawn by the application will be displayed even if newer frames are available. Direct3D does not implement a most-recent buffer swapping strategy, and Microsoft's documentation calls a Direct3D swap chain of three buffers "triple buffering". Triple buffering as described above is superior for interactive purposes such as gaming, but Direct3D swap chains of more than three buffers can be better for tasks such as presenting frames of a video where the time taken to decode each frame may be highly variable.

    Read more →
  • Bag-of-words model

    Bag-of-words model

    The bag-of-words (BoW) model is a model of text which uses an unordered collection (a "bag") of words. It is used in natural language processing and information retrieval (IR). It disregards word order (and thus most of syntax or grammar) but captures multiplicity. The bag-of-words model is commonly used in methods of document classification where, for example, the (frequency of) occurrence of each word is used as a feature for training a classifier. It has also been used for computer vision. An early reference to "bag of words" in a linguistic context can be found in Zellig Harris's 1954 article on Distributional Structure. == Definition == The following models a text document using bag-of-words. Here are two simple text documents: Based on these two text documents, a list is constructed as follows for each document: Representing each bag-of-words as a JSON object, and attributing to the respective JavaScript variable: Each key is the word, and each value is the number of occurrences of that word in the given text document. The order of elements is free, so, for example {"too":1,"Mary":1,"movies":2,"John":1,"watch":1,"likes":2,"to":1} is also equivalent to BoW1. It is also what we expect from a strict JSON object representation. Note: if another document is like a union of these two, its JavaScript representation will be: So, as we see in the bag algebra, the "union" of two documents in the bags-of-words representation is, formally, the disjoint union, summing the multiplicities of each element. === Word order === The BoW representation of a text removes all word ordering. For example, the BoW representation of "man bites dog" and "dog bites man" are the same, so any algorithm that operates with a BoW representation of text must treat them in the same way. Despite this lack of syntax or grammar, BoW representation is fast and may be sufficient for simple tasks that do not require word order. For instance, for document classification, if the words "stocks" "trade" "investors" appears multiple times, then the text is likely a financial report, even though it would be insufficient to distinguish between Yesterday, investors were rallying, but today, they are retreating.andYesterday, investors were retreating, but today, they are rallying.and so the BoW representation would be insufficient to determine the detailed meaning of the document. == Implementations == Implementations of the bag-of-words model might involve using frequencies of words in a document to represent its contents. The frequencies can be "normalized" by the inverse of document frequency, or tf–idf. Additionally, for the specific purpose of classification, supervised alternatives have been developed to account for the class label of a document. Lastly, binary (presence/absence or 1/0) weighting is used in place of frequencies for some problems (e.g., this option is implemented in the WEKA machine learning software system). == Hashing trick == A common alternative to using dictionaries is the hashing trick, where words are mapped directly to indices with a hash function. When using a hash function, no memory is required to store a dictionary. In practice, hashing simplifies the implementation of bag-of-words models and improves scalability. Collisions can occur when two words are hashed to the same index, but this happens infrequently and may function as a form of regularization.

    Read more →
  • Comparison of operating systems

    Comparison of operating systems

    These tables provide a comparison of operating systems, of computer devices, as listing general and technical information for a number of widely used and currently available PC or handheld (including smartphone and tablet computer) operating systems. The article "Usage share of operating systems" provides a broader, and more general, comparison of operating systems that includes servers, mainframes and supercomputers. Because of the large number and variety of available Linux distributions, they are all grouped under a single entry; see comparison of Linux distributions for a detailed comparison. There is also a variety of BSD and DOS operating systems, covered in comparison of BSD operating systems and comparison of DOS operating systems. == Nomenclature == The nomenclature for operating systems varies among providers and sometimes within providers. For purposes of this article the terms used are; kernel In some operating systems, the OS is split into a low level region called the kernel and higher level code that relies on the kernel. Typically the kernel implements processes but its code does not run as part of a process. hybrid kernel monolithic kernel Nucleus In some operating systems there is OS code permanently present in a contiguous region of memory addressable by unprivileged code; in IBM systems this is typically referred to as the nucleus. The nucleus typically contains both code that requires special privileges and code that can run in an unprivileged state. Typically some code in the nucleus runs in the context of a dispatching unit, e.g., address space, process, task, thread, while other code runs independent of any dispatching unit. In contemporary operating systems unprivileged applications cannot alter the nucleus. License and pricing policies vary widely among different systems. Among others, the tables below use the following terms: BSD BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. bundled The fee is included in the price of the hardware == General information == == Technical information == == Security == == Commands == For POSIX compliant (or partly compliant) systems like FreeBSD, Linux, macOS or Solaris, the basic commands are the same because they are standardized. NOTE: Linux systems may vary by distribution which specific program, or even 'command' is called, via the POSIX alias function. For example, if you wanted to use the DOS dir to give you a directory listing with one detailed file listing per line you could use alias dir='ls -lahF' (e.g. in a session configuration file).

    Read more →
  • NIS2 Directive

    NIS2 Directive

    The Directive (EU) 2022/2555, commonly known as NIS2 is a directive of the European Union aimed at protecting digital infrastructure, in particular critical infrastructure. It broadened the sectors covered by EU network and information security rules and updated incident reporting and oversight compared to the NIS1. Member States were required to transpose NIS2 by 17 October 2024, and the earlier NIS Directive was repealed on 18 October 2024. Only 23 Member States have fully implemented the measures contained with the NIS Directive. Infringement proceedings against them to enforce the Directive have not taken place, and they are not expected to take place in the near future. This failed implementation has led to the fragmentation of cybersecurity capabilities across the EU, with differing standards, incident reporting requirements and enforcement requirements being implemented in different Member States. From the EFTA countries (to April 2026) only Liechtenstein has fully transposed the NIS2 Directive. While the EFTA commission is conducting preparations to transpose the directive into its legislation. == National implementations == === Czech Republic === It is implemented through the Act No. 264/2025 Coll. also called Zákon o kybernetické bezpečnosti (Cybersecurity law) and through another five implementing regulations. The transposing legislation came into force on November 1st, 2025. === Germany === It is implemented through the Gesetz zur Umsetzung der NIS-2-Richtlinie und zur Regelung wesentlicher Grundzüge des Informationssicherheitsmanagements in der Bundesverwaltung. === Ireland === It is implemented through the National Cyber Security Bill. === The Netherlands === It is implemented through the Cyberbeveiligingswet (Cbw). === Slovakia === It is implemented through via an amendment of the Act No. 69/2018 Coll. also called Zákon o kybernetickej bezpečnosti a o zmene a doplnení niektorých zákonov (Law on Cybersecurity and change and amendment of certain laws). It came into force on November 1st, 2025. === Spain === It is implemented through the Esquema Nacional de Seguridad (ENS).

    Read more →
  • World Database of Happiness

    World Database of Happiness

    The World Database of Happiness is a web-based archive of research findings on subjective appreciation of life, based in the Erasmus Happiness Economics Research Organization of the Erasmus University Rotterdam in The Netherlands. The database contains both an overview of scientific publications on happiness and a digest of research findings. Happiness is defined as the degree to which an individual judges the quality of his or her life as a whole favorably. Two 'components' of happiness are distinguished: hedonic level of affect (the degree to which pleasant affect dominates) and contentment (perceived realization of wants). == Aims == The World Database of Happiness is a tool to quickly acquire an overview on the ever-growing stream of research findings on happiness Medio 2023 the database covered some 16,000 scientific publications on happiness, from which were extracted 23,000 distributional findings (on how happy people are) and another 24,000 correlational findings (on factors associated with more and less happiness). The first findings date from 1915. == Technique == The World Database of Happiness is a ‘findings archive’, which consists of electronic ‘finding pages’ on which separate research results are described in a standard format and terminology. These finding pages can be selected on various characteristics, such as population studies, the measure of happiness used and observed co-variates. All finding-pages have a specific internet address to which links can be made in scientific review papers or policy recommendations. This allows a concise presentation of many findings in a table, while providing readers with access to detail. == Scientific use == The Database has been cited in 254 scientific papers, for example to access under what conditions economic growth enhances average happiness or to show that rising mean happiness at first raises happiness inequality, but further rise will diminish these differences, or that healthy eating is associated with more happiness, even after controlling for the effect on health Another finding is that relative simple happiness training techniques raise happiness by some 5% == Popular use == The World Database of Happiness is often used by popular media to make lists of the happiest countries around the globe. An example is the Happy Planet Index, which aims to chart sustainable happiness all over the world by combining data on longevity, happiness and the size of the ecological footprint of citizens. == Strengths and weaknesses == The database has a clear conceptual focus, it includes only research findings on subjective enjoyment of one's life as a whole. Thereby it evades the Babel that has haunted the study of happiness for ages. The other side of that coin is that much interesting research is left out. The findings are reported with technical details about measurement and statistical analysis. This detail is welcomed by scholars, but makes the information difficult to digest for lay-persons. Still another limitation is that the determinants of happiness appear to vary considerably across persons and situations, which make it hard to draw general conclusions about the causes of happiness. What is clear is that poor health, separation, unemployment and lack of social contact are all strongly negatively associated with happiness. Another problem for the World database of happiness is that the studies on happiness increase with such a high rate that it gets increasingly difficult to offer a complete overview of all research findings. A further concern is that the Database of Happiness is exclusively focused on hedonic happiness (feeling good) and not on mature happiness that might exist in the face of suffering

    Read more →
  • PCVC Speech Dataset

    PCVC Speech Dataset

    The PCVC (Persian Consonant Vowel Combination) Speech Dataset is a Modern Persian speech corpus for speech recognition and also speaker recognition. The dataset contains sound samples of Modern Persian combination of vowel and consonant phonemes from different speakers. Every sound sample contains just one consonant and one vowel So it is somehow labeled in phoneme level. This dataset consists of 23 Persian consonants and 6 vowels. The sound samples are all possible combinations of vowels and consonants (138 samples for each speaker). The sample rate of all speech samples is 48000 which means there are 48000 sound samples in every 1 second. Every sound sample starts with consonant then continues with vowel. In each sample, in average, 0.5 second of each sample is speech and the rest is silence. Each sound sample ends with silence. All of sound samples are denoised with "Adaptive noise reduction" algorithm. Compared to Farsdat speech dataset and Persian speech corpus it is more easy to use because it is prepared in .mat data files. Also it is more based on phoneme based separation and all samples are denoised. == Contents == The corpus is downloadable from its Kaggle web page, and contains the following: .mat data files of sound samples in a 23630000 matrix, in which 23 is number of consonants, 6 is the number of vowels and 30000 is the length of sound sample.

    Read more →
  • Security.txt

    Security.txt

    security.txt is an accepted standard for website security information that allows security researchers to report security vulnerabilities easily. The standard prescribes a text file named security.txt in the well known location, similar in syntax to robots.txt but intended to be machine and human readable, for those wishing to contact a website's owner about security issues. security.txt files have been adopted by Google, GitHub, LinkedIn, and Facebook. == History == The Internet Draft was first submitted by Edwin Foudil in September 2017. At that time it covered four directives, "Contact", "Encryption", "Disclosure" and "Acknowledgement". Foudil expected to add further directives based on feedback. In addition, web security expert Scott Helme said he had seen positive feedback from the security community while use among the top 1 million websites was "as low as expected right now". In 2019, the Cybersecurity and Infrastructure Security Agency (CISA) published a draft binding operational directive that requires all US federal agencies to publish a security.txt file within 180 days. The Internet Engineering Steering Group (IESG) issued a Last Call for security.txt in December 2019 which ended on January 6, 2020. A study in 2021 found that over ten percent of top-100 websites published a security.txt file, with the percentage of sites publishing the file decreasing as more websites were considered. The study also noted a number of discrepancies between the standard and the content of the file. In April 2022 the security.txt file has been accepted by Internet Engineering Task Force (IETF) as RFC 9116. == File format == security.txt files can be served under the /.well-known/ directory (i.e. /.well-known/security.txt) or the top-level directory (i.e. /security.txt) of a website. The file must be served over HTTPS and in plaintext format.

    Read more →
  • Mozilla VPN

    Mozilla VPN

    Mozilla VPN is an open-source virtual private network developed by Mozilla. It launched in beta as Firefox Private Network on September 10, 2019, and officially launched on July 15, 2020, as Mozilla VPN. Mozilla VPN should not be confused with the built-in VPN in Firefox since version 149 released in March 2026, which is free with a monthly data limit of 50 GB but only masks traffic that originates in Firefox unlike Mozilla VPN that protects the entire device. == History == The Firefox Private Network web browser extension beta version was released on September 10, 2019, as part of the relaunch of Mozilla's Test Pilot Program, a program that allowed Firefox users to test experimental new features which had been shuttered in January 2019. The beta of the subscription-based standalone virtual private network for Android, Microsoft Windows, and Chromebook launched on February 19, 2020, with the iOS version following soon after. Firefox Private Network was rebranded as "Mozilla VPN" on June 18, 2020, and officially launched as Mozilla VPN on July 15, 2020. At launch, Mozilla VPN was available in six countries (the United States, Canada, the United Kingdom, Singapore, Malaysia, and New Zealand) for Windows 10, Android, and iOS (beta). Over time, the service also launched in Germany, France, Italy, Spain, Switzerland, Austria, Belgium, Netherlands, Ireland, Finland, Sweden, Poland, Czechia, Hungary, Romania, Bulgaria, Slovakia, Portugal, Denmark, Croatia, Lithuania, Slovenia, Latvia, Luxembourg, Estonia, Cyprus, and Malta. == Audits history == Cybersecurity firm Cure53 conducted a security audit for Mozilla VPN in August 2020 and identified multiple vulnerabilities, including one critical-severity vulnerability. In March 2021, Cure53 conducted a second security audit, which noted significant improvements since the 2020 audit. The second audit identified multiple issues, including two medium-severity and one high-severity vulnerability, but concluded that by the time of publication, only one vulnerability remained unresolved, and that it would require "a strong state-funded attacker-model" to be exploitable. Mozilla disclosed most of the vulnerabilities in July 2021 and released the full report by Cure53 in August 2021. In April 2023, Cure53 conducted a third security audit, the results of which Mozilla disclosed in December that year, along with the full report by Cure53. == Features == Mozilla VPN masks the user's IP address, hiding the user's location data from the websites accessed by the user, and encrypts all network activity. The service allows for up to 5 simultaneous connections, to any of more than 500 servers in 30+ countries, and is available on the mobile operating systems iOS and Android and the desktop operating systems Microsoft Windows, macOS and Linux. Mozilla VPN's infrastructure is provided by the Swedish Mullvad VPN service, which uses the WireGuard VPN protocol. The VPN software comes with additional features, like recommended server locations, the ability to block ads, block ad trackers and malware, the ability to exclude certain applications from protection, the ability to set multi-hop connections, and to set custom DNS servers. When used with Firefox and the official extension, Mozilla VPN allows the use of different settings per container as well as bypassing the VPN for specific websites.

    Read more →