AI Art Creator Free

AI Art Creator Free — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • GEPIR

    GEPIR

    GEPIR (Global Electronic Party Information Registry) was a distributed database operated and owned by GS1 that contains basic information on over 1,000,000 companies in over 100 countries. The database could be searched by Global Trade Item Number (GTIN) code (including Universal Product Code (UPC) and EAN-13 codes), container Code (Serial Shipping Container Code (SSCC)), location number (Global Location Number (GLN)), and (in some countries) the company name. A SOAP webservice existed for API access. As of end December 2023, GEPIR was replaced by a service called Verified by GS1. While it operated, GEPIR had more than 1 million members in more than 100 countries. In 2013, all GS1 111 member organisations joined GEPIR. == Access == GEPIR was accessible for free in almost all countries but the number of request per day was limited (from 20 to 30). Since October 2013, GS1 France restricts access to GEPIR to companies (registration with SIREN code was required to use it). A premium access service had been created by GS1 France in January 2010 which allows companies to use GS1 web and SOAP interface without any limit. == System architecture == GEPIR was a lookup service coordinated by the GS1 GO that provided all end users with the ability to look up information about GS1 Identification Keys. Depending on the service, systems were provided by GS1 Member Organisations (MOs) or 3rd party service providers, or both. Where a GS1 MO did not choose to provide the service directly to its end users, the GS1 Global Office provided the service for that geography. Some services involved a technical component deployed by the GS1 Global Office that coordinates the systems provided by GS1 MOs and/or 3rd party service providers. The GEPIR service was provided by systems deployed by GS1 MOs, with the GS1 GO providing a central point of coordination to federate the local systems. The GS1 GO also provides the MO-level service for MOs that could not or did not wish to deploy their own system.

    Read more →
  • Data refuge

    Data refuge

    Data Refuge is a public and collaborative project designed to address concerns about federal climate and environmental data that is in danger of being lost. In particular, the initiative addresses five main concerns: What are the best ways to safeguard data? How do federal agencies play a crucial role in collecting, managing, and distributing data? How do government priorities impact data's accessibility? Which projects and research fields depend on federal data? Which data sets are of value to research and local communities, and why? Data Refuge began as a grassroots organization in opposition to government data on climate change and the environment not being archived systemically. Data Refuge's main goal is to collect and allocate data in multiple safe locations to create a sustainable way of archiving old and new data. Data Refuge was initiated in 2016 to protect federal climate and environmental data that is vulnerable under an administration that denies climate change. The system aims to make public research-quality copies of federal climate and environmental data. Data Refuge is supported by the National Geographic Foundation, private donors, Libraries+ Network, Preserving Electronic Governance Initiative (PEGI), the Union of Concerned Scientists (USC), and the Penn Program in Environmental Humanities (PPEH). == Types of data == Data Refuge collects public federal data on the climate and environment in the form of satellite imagery, PDFs, and stories. The data are stored in multiple trusted locations as they are less vulnerable if in only one location, and to ensure accessibility for researchers. Through the Data Rescue events, Data Refuge has accumulated 4 terabytes of data, 30,000 URLs, and 800 participants. === Storytelling === Data Refuge collects stories on vulnerable federal climate and environmental data through: surveys, oral history, photo essays, maps, video shorts, and animations. The stories are archived in a public bank that showcase how federal environmental data support health and safety in communities. Data Stories are collected at Data Rescue events, which are partnered with universities, city and town halls, and advocacy groups. Data stories are collected and used to emphasize the importance of Data Refuge, in how the data on climate change and the environment are being used by people in the United States and across the world for meaningful practices.

    Read more →
  • Consumer relationship system

    Consumer relationship system

    Consumer relationship systems (CRS) are specialized customer relationship management (CRM) software applications that are used to handle a company's dealings with its customers. Current consumer relationship systems integrate the software with telephone and call recording systems as well as with corporate systems for input and reporting. Customers can provide input from the company's website directly into the CRS. These systems are popular because they can deliver the 'voice of the consumer' that contributes to product quality improvement and that ultimately increases corporate profits. Consumer relationship systems that provide automated support as well as advanced systems may have artificial intelligence (AI) interfaces that can extract and analyse data collected, or handle basic questions and complaints. == History == The first CRS was developed in the 1980s. In 1981 Michael Wilke and Robert Thornton founded Wilke/Thornton, Inc in Columbus, Ohio, to develop new CRS software.

    Read more →
  • Commit (data management)

    Commit (data management)

    In computer science and data management, a commit is a behavior that marks the end of a transaction and provides Atomicity, Consistency, Isolation, and Durability (ACID) in transactions. The submission records are stored in the submission log for recovery and consistency in case of failure. In terms of transactions, the opposite of committing is giving up tentative changes to the transaction, which is rolled back. Due to the rise of distributed computing and the need to ensure data consistency across multiple systems, commit protocols have been evolving since their emergence in the 1970s. The main developments include the Two-Phase Commit (2PC) first proposed by Jim Gray, which is the fundamental core of distributed transaction management. Subsequently, the Three-phase Commit (3PC), Hypothesis Commit (PC), Hypothesis Abort (PA), and Optimistic Commit protocols gradually emerged, solving the problems of blocking and fault recovery. Today, new fields such as e-commerce payment and blockchain technology are emerging, and submission protocols play a significant role in various business areas. By effectively handling transactions, resolving faults and recovering problems, the commit protocol becomes crucial in ensuring the reliability and consistency of data management. == History == The concept of Commit originated in the late 1960s and early 1970s, when computer technology was rapidly advancing and data management was becoming an important requirement in business and finance. Enterprises have gradually replaced the traditional paper records with computers, which has fully improved the work efficiency. The reliability and consistency of data have become a necessary requirement. Transaction management at this stage is relatively simple, limited to using a single computer for processing. It merely effectively records the changes in data to ensure that the data remains stable after the transaction is completed or terminated. In the late 1970s, as database systems moved from a single calculator operation to multiple distributed collaborations, ensuring data consistency and reliability became a new challenge. In 1978, computer scientist Jim Gray proposed the famous two-phase Commit Protocol (2PC), which became an effective solution for distributed transaction management, successfully managing data synchronization problems between multiple nodes. However, this commit protocol has some potential transaction blocking problems when nodes fail. In the early 1980s, researchers discovered that although the two-step commit protocol was effective at synchronizing data, there could be long waits and even system crashes, with limitations. To improve this problem, people have begun to explore new and effective methods, including enhancing efficiency by reducing message communication during the protocol process. IBM's R database introduced the Assumed Commit and Assumed abort protocols, which contributed significantly to transaction management efficiency. These two protocols have greatly improved the processing efficiency of distributed transactions by reducing communication overhead and have become an important breakthrough in the technology of transaction commit protocols. By the early 1990s, with the increase in business demands and the complexity of transactions, enterprises required higher efficiency in distributed transaction processing. In order to adapt to the needs of different environments, the scientific community has gradually developed various variants of commit protocols to provide more flexible transaction management options for different needs. For example, the three-phase commit protocol promotes the commit of transactions more effectively and reduces the occurrence of blocking problems by adding a pre-commit protocol and a timeout mechanism. In the 21st century, with the popularization of mobile Internet and wireless technology, the commit protocol has been further developed, and researchers have begun to pay attention to how to reduce the blocking in the transaction process to solve the problem of broadband limitation, battery life and network instability in the mobile environment. The proposal of optimistic commit protocol marks the extension of commit technology from traditional database to the emerging mobile data field. This protocol allows transactions to temporarily use unconfirmed data, improving the user experience in cases of poor network conditions. In recent years, with the rise of blockchain and decentralized technologies, submission protocols and consensus mechanisms have gradually merged. These consensus algorithms play a role in tamper-proofing and preventing malicious attacks on node pairs in a decentralized environment. This enables commit to no longer be confined to the scope of traditional database management, but to become the core technology of trust computing and distributed ledgers, further expanding the application field of commit in the digital age. This integration has brought about extensive application impacts. Each transaction can achieve the effect of tracking global submissions through the verification of the consensus mechanism, becoming an important technical foundation for promoting the circulation of digital assets, the operation of cryptocurrencies and decentralized applications. == Commit Protocol Types == In the world of data management, a transaction is a series of database operations, such as bank transfers and order submission. In order to ensure the accuracy, consistency, and security of the data, transactions are usually completed completely, or cancelled completely, leaving no partially completed results. Commit protocol is the method used to coordinate this process. Different protocols are applicable to different submission scenarios and have their own advantages and disadvantages. There are four major commit protocols. === Two-Phase Commit (2PC) === The two-phase commit protocol is the most classic and broadest approach to distributed transactions, which includes both a preparation phase and a commit phase. This commit protocol is designed to allow the database coordinator to determine if all participating nodes agree. The preparation phase is the phase in which the coordination node sends a ready to commit request to all nodes participating in the transaction. The commit phase is a global commit after all participating nodes are ready, and if no agreement is reached, all nodes roll back the transaction and undo all previous operations. Although the two-phase commit protocol is the easiest to operate and widely used, its obvious drawback is that it can cause transactions to be blocked for a long time when nodes fail, resulting in a decline in system performance and making it difficult to terminate or continue immediately. === Three-Phase Commit (3PC) === The three-phase commit protocol is an improved non-blocking protocol based on 2PC, which is divided into three stages: preparation, pre-commit and commit. Firstly, each node sends a "preparation" request. After confirmation, a "pre-submission" stage is added. At this point, each node has completed most of the preparatory work and is waiting for the final confirmation. Finally, in the formal commit stage, after all nodes send the "commit" request, the transaction is completed and committed. Compared with 2PC, it increases the timeout mechanism, avoids the blocking problem caused by single point of failure, and improves the reliability of the system. The three-phase commit protocol significantly optimizes transaction reliability, but adds additional overhead for message transmission and state maintenance. It is more suitable for distributed application scenarios with high transaction sensitivity and no acceptance of long waiting times. === Presumed Commit (PC) and Presumed Abort (PA) === Presumed Commit (PC) is the default that the transaction will be committed successfully and rollback will be notified unless an anomaly is encountered. This commit reduces the message overhead and logging costs of a normal commits. Presumed Abort (PA) is assumed that the default state of the transaction is a rollback and will only be committed when all nodes have explicitly agreed. This commit is applicable to transactions that are not updated frequently or have a low probability of successful commit. The IBM R Distributed Database management System was the first to propose and practice the PC and PA protocols, handling distributed transaction management very efficiently and becoming a classic case in the field of database transaction management. === Optimistic Commit Protocol === With the rise of the Internet, the previous commit protocols are facing new challenges, especially in mobile scenarios with unstable networks. Excessively long transaction waiting times can affect the user experience. The Optimistic Commit Protocol allows a transaction to temporarily access uncommitted data before committing to avoid wait times. This type of commit is suitable f

    Read more →
  • Lesk algorithm

    Lesk algorithm

    The Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in 1986. It operates on the premise that words within a given context are likely to share a common meaning. This algorithm compares the dictionary definitions of an ambiguous word with the words in its surrounding context to determine the most appropriate sense. Variations, such as the Simplified Lesk algorithm, have demonstrated improved precision and efficiency. However, the Lesk algorithm has faced criticism for its sensitivity to definition wording and its reliance on brief glosses. Researchers have sought to enhance its accuracy by incorporating additional resources like thesauruses and syntactic models. == Overview == The Lesk algorithm is based on the assumption that words in a given "neighborhood" (section of text) will tend to share a common topic. A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained in its neighborhood. Versions have been adapted to use WordNet. An implementation might look like this: for every sense of the word being disambiguated one should count the number of words that are in both the neighborhood of that word and in the dictionary definition of that sense the sense that is to be chosen is the sense that has the largest number of this count. A frequently used example illustrating this algorithm is for the context "pine cone". The following dictionary definitions are used: PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness CONE 1. solid body which narrows to a point 2. something of this shape whether solid or hollow 3. fruit of certain evergreen trees As can be seen, the best intersection is Pine #1 ⋂ Cone #3 = 2. == Simplified Lesk algorithm == In Simplified Lesk algorithm, the correct meaning of each word in a given context is determined individually by locating the sense that overlaps the most between its dictionary definition and the given context. Rather than simultaneously determining the meanings of all words in a given context, this approach tackles each word individually, independent of the meaning of the other words occurring in the same context. "A comparative evaluation performed by Vasilescu et al. (2004) has shown that the simplified Lesk algorithm can significantly outperform the original definition of the algorithm, both in terms of precision and efficiency. By evaluating the disambiguation algorithms on the Senseval-2 English all words data, they measure a 58% precision using the simplified Lesk algorithm compared to the only 42% under the original algorithm. Note: Vasilescu et al. implementation considers a back-off strategy for words not covered by the algorithm, consisting of the most frequent sense defined in WordNet. This means that words for which all their possible meanings lead to zero overlap with current context or with other word definitions are by default assigned sense number one in WordNet." Simplified LESK Algorithm with smart default word sense (Vasilescu et al., 2004) The COMPUTEOVERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The original Lesk algorithm defines the context in a more complex way. == Criticisms == Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a significant limitation in that dictionary glosses tend to be fairly short and do not provide sufficient vocabulary to relate fine-grained sense distinctions. A lot of work has appeared offering different modifications of this algorithm. These works use other resources for analysis (thesauruses, synonyms dictionaries or morphological and syntactic models): for instance, it may use such information as synonyms, different derivatives, or words from definitions of words from definitions. == Lesk variants == Original Lesk (Lesk, 1986) Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003): In the adaptive lesk algorithm, a word vector is created corresponds to every content word in the wordnet gloss. Concatenating glosses of related concepts in WordNet can be used to augment this vector. The vector contains the co-occurrence counts of words co-occurring with w in a large corpus. Adding all the word vectors for all the content words in its gloss creates the Gloss vector g for a concept. Relatedness is determined by comparing the gloss vector using the Cosine similarity measure. There are a lot of studies concerning Lesk and its extensions: Wilks and Stevenson, 1998, 1999; Mahesh et al., 1997; Cowie et al., 1992; Yarowsky, 1992; Pook and Catlett, 1988; Kilgarriff and Rosensweig, 2000; Kwong, 2001; Nastase and Szpakowicz, 2001; Gelbukh and Sidorov, 2004.

    Read more →
  • Harvest now, decrypt later

    Harvest now, decrypt later

    Harvest now, decrypt later (HNDL) is a surveillance strategy that relies on the acquisition and long-term storage of currently unreadable encrypted data awaiting possible breakthroughs in decryption technology that would render it readable in the future—a hypothetical date referred to as Y2Q (a reference to Y2K), or Q-Day. The most common concern is the prospect of developments in quantum computing which would allow current strong encryption algorithms to be broken at some time in the future, making it possible to decrypt any stored material that had been encrypted using those algorithms. However, the improvement in decryption technology need not be due to a quantum-cryptographic advance; any other form of attack capable of enabling decryption would be sufficient. The existence of this strategy has led to concerns about the need to urgently deploy post-quantum cryptography; even though no practical quantum attacks yet exist, some data stored now may still remain sensitive even decades into the future. As of 2022, the U.S. federal government has proposed a roadmap for organizations to start migrating toward quantum-cryptography-resistant algorithms to mitigate these threats. This new version of Commercial National Security Algorithm Suite uses publicly-available algorithms and is allowed for government use up to the TOP SECRET level. == Terminology and scope == The term “harvest now, decrypt later” encompasses various surveillance or espionage operations in which ciphertext or encrypted communications are collected today with the view that they may one day be decrypted, given sufficient advances in computing power or cryptanalysis. The abbreviation HNDL is sometimes used in technical and policy documents. The “Y2Q” (or “Q-Day”) label draws an analogy to the Y2K date-change issue, emphasising a potential future point at which current cryptography may collapse. The strategy is particularly relevant for data with long confidentiality lifetimes, such as diplomatic communications, personal health records, critical infrastructure logs, or intellectual property. == Mitigation strategies == The primary defense against HNDL attacks is the transition to post-quantum cryptography (PQC), which utilizes algorithms believed to be secure against quantum computer attacks. However, because PQC protects the data payload digitally, rather than the transmission itself, the encrypted data can still be harvested and stored. A complementary approach involves physical layer security (also known as optical layer encryption or photonic shielding). Unlike algorithmic encryption, this method modifies the optical waveform itself—often by burying the signal within optical noise or using spectral phase encoding—to render the transmission unrecordable by standard receivers. By preventing the attacker from capturing a valid signal in the first place, this approach aims to eliminate the "harvest" phase of the threat. Commercial implementations of harvest-proof optical encryption have been developed by firms such as CyberRidge to secure long-haul fiber networks. Field trials have demonstrated 100 Gbps throughput over legacy DWDM networks using this method.

    Read more →
  • Data steward

    Data steward

    A data steward is an oversight or data governance role within an organization, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets. A data steward may share some responsibilities with a data custodian, such as the awareness, accessibility, release, appropriate use, security and management of data. A data steward would also participate in the development and implementation of data assets. A data steward may seek to improve the quality and fitness for purpose of other data assets their organization depends upon but is not responsible for. Data stewards have a specialist role that utilizes an organization's data governance processes, policies, guidelines and responsibilities for administering an organizations' entire data in compliance with policy and/or regulatory obligations (e.g., GDPR, HIPAA). The overall objective of a data steward is the data quality of the data assets, datasets, data records and data elements. This includes documenting metainformation for the data, such as definitions, related rules/governance, physical manifestation, and related data models (most of these properties being specific to an attribute/concept relationship), identifying owners/custodian's various responsibilities, relations insight pertaining to attribute quality, aiding with project requirement data facilitation and documentation of capture rules. Data stewards begin the stewarding process with the identification of the data assets and elements which they will steward, with the ultimate result being standards, controls and data entry. The steward works closely with business glossary standards analysts (for standards), with data architect/modelers (for standards), with DQ analysts (for controls) and with operations team members (good-quality data going in per business rules) while entering data. Data stewardship roles are common when organizations attempt to exchange data precisely and consistently between computer systems and to reuse data-related resources. Master data management often makes references to the need for data stewardship for its implementation to succeed. Data stewardship must have precise purpose, fit for purpose or fitness. == Data steward responsibilities == A data steward ensures that each assigned data element: Has clear and unambiguous data element definition Does not conflict with other data elements in the metadata registry (removes duplicates, overlap etc.) Has clear enumerated value definitions if it is of type Code Is still being used (remove unused data elements) Is being used consistently in various computer systems Is being used, fit for purpose = Data Fitness Has adequate documentation on appropriate usage and notes Documents the origin and sources of authority on each metadata element Is protected against unauthorised access or change Responsibilities of data stewards vary between different organisations and institutions. For example, at Delft University of Technology, data stewards are perceived as the first contact point for any questions related to research data. They also have subject-specific background allowing them to easily connect with researchers and to contextualise data management problems to take into account disciplinary practices. == Types of data stewards == Depending on the set of data stewardship responsibilities assigned to an individual, there are 4 types (or dimensions of responsibility) of data stewards typically found within an organization: Data object data steward - responsible for managing reference data and attributes of one business data entity Business data steward - responsible for managing critical data, both reference and transactional, created or used by one business function. The data steward may also serve as a liaison between the organization's data users and technical teams, helping to bridge the gap between business needs and technical requirements. They may also play a role in educating others within the organization about best practices for data management, and advocating for data-driven decision-making. Process data steward - responsible for managing data across one business process System data steward - responsible for managing data for at least one IT system == Benefits of data stewardship == Systematic data stewardship can foster: Faster analysis Consistent use of data management resources Easy mapping of data between computer systems and exchange documents Lower costs associated with migration to (for example) service-oriented architecture (SOA) Mitigation of data risk Better control of dangers associated with privacy, legal, errors, etc. Assignment of each data element to a person sometimes seems like an unimportant process. But multiple groups have found that users have greater trust and usage rates in systems where they can contact a person with questions on each data element. == Examples == Delft University of Technology (TU Delft) offers an example of data stewardship implementation at a research institution. In 2017 the Data Stewardship Project was initiated at TU Delft to address research data management needs in a disciplinary manner across the whole campus. Dedicated data stewards with subject-specific background were appointed at every TU Delft faculty to support researchers with data management questions and to act as a linking point with the other institutional support services. The project is coordinated centrally by TU Delft Library, and it has its own website, blog and a YouTube channel. The [1]EPA metadata registry furnishes an example of data stewardship. Note that each data element therein has a "POC" (point of contact). In 2023, ETH Zurich launched the Data Stewardship Network (DSN) to facilitate collaboration among employees engaged in data management, analysis, and code development across research groups. The DSN serves as a platform for networking and knowledge exchange, aiming to professionalize the role of data stewards who support research data management and reproducible workflows. Established by the team for Research Data Management and Digital Curation at the ETH Library, the DSN collaborates with Scientific IT Services to provide expertise in areas such as storage infrastructure and reproducible workflows. == Data stewardship applications == Information stewardship applications are business solutions used by business users acting in the role of information steward (interpreting and enforcing information governance policy, for example). These developing solutions represent, for the most part, an amalgam of a number of disparate, previously IT-centric tools already on the market, but are organized and presented in such a way that information stewards (a business role) can support the work of information policy enforcement as part of their normal, business-centric, day-to-day work in a range of use cases. The initial push for the formation of this new category of packaged software came from operational use cases — that is, use of business data in and between transactional and operational business applications. This is where most of the master data management efforts are undertaken in organizations. However, there is also now a faster-growing interest in the new data lake arena for more analytical use cases.

    Read more →
  • List of cryptosystems

    List of cryptosystems

    A cryptosystem is a set of cryptographic algorithms that map ciphertexts and plaintexts to each other. == Private-key cryptosystems == Private-key cryptosystems use the same key for encryption and decryption. Caesar cipher Substitution cipher Enigma machine Data Encryption Standard Twofish Serpent Camellia Salsa20 ChaCha20 Blowfish CAST5 Kuznyechik RC4 3DES Skipjack Safer IDEA Advanced Encryption Standard, also known as AES and Rijndael. == Public-key cryptosystems == Public-key cryptosystems use a public key for encryption and a private key for decryption. Diffie–Hellman key exchange RSA encryption Rabin cryptosystem Schnorr signature ElGamal encryption Elliptic-curve cryptography Lattice-based cryptography McEliece cryptosystem Multivariate cryptography Isogeny-based cryptography

    Read more →
  • Free Studio

    Free Studio

    Free Studio is a freeware set of multimedia computer programs developed by DVDVideoSoft. The programs are available in one integrated package and also as separate downloads (Free Studio Manager is included in both). == Overview == The Free Studio software bundle consists of about 48 programs, grouped into several sections: YouTube, MP3 & Audio, CD-DVD-BD, DVD & Video, Photo & Images, Mobiles, Apple Devices, and 3D. The largest group is the DVD & Video section containing 14 different applications. Mobiles section is the second largest group with 13 programs. However, the YouTube section, particularly YouTube downloading programs, has gained more popularity among users. The programs have been tested and endorsed by a dozen of software portals and have won awards from these sites. Free Studio is most popular in Germany, Greece, Italy, and the United States. It is also popular in Japan, France, and the United Kingdom. Some of the programs in the package are free and open-source software. == History == DVDVideoSoft project was launched in 2006 by company Digital Wave Ltd., for software development to produce multimedia application software. The founders distributed paid software as an affiliate at the start, later their own products appeared on the site. Free YouTube Download was the first successful program, then DVDVideoSoft created and launched several other 'Free YouTube' applications. Later on upon users' requests DVDVideoSoft started developing other kinds of applications including media converters etc. Today DVDVideoSoft offers up to 49 different programs for video, audio and image processing individually or integrated into the Free Studio package. == Features == DVDVideoSoft YouTube programs can be used to download YouTube videos in their original format and convert them to AVI, DVD, MP4, WMV etc. or different audio formats. YouTube section contains Free Video Call Recorder for Skype button, but the program itself is not included into FS installation (it has to be downloaded and installed separately). The "MP3 & Audio" section consists of the programs which convert audio files between different formats, convert audio files to Flash for web, extract audio from video files, edit audio files (Free Audio Dub), rip and burn CDs. Enclosed in the CD-DVD-BD section are the applications that enable users to burn files and folders to discs, to convert videos to a DVD format and vice versa, to burn CDs, and to copy music from audio CDs into files. The "DVD and Video" section contains several desktop video and DVD converters. Some of the programs can flip, rotate and cut (Free Video Dub) videos. One of the most popular programs from the section is Free Video Dub. Converted videos are now, contrary to previous versions, watermarked if no paid membership is present. Free Studio includes several applications for Apple phones, iPods and other devices. The Mobiles section contains a dozen video converters for various mobile devices such as cell phones, Tablets and Game consoles. They convert videos to play them on (BlackBerry, HTC, LG phones, Sony/Sony Ericsson, Nintendo, Xbox, Motorola phones, etc.) The "Photo & Images" section incorporates the programs for image conversion and resizing, extracting JPEG frames from videos (Free Video To JPEG Converter), recording screen activities, making screenshots (Free Screen Recorder). The 3D section is composed of the programs to make 3D videos and 3D images. There are several algorithms which allow to create different types of 3D images. == Supported formats == === Video formats === Input: .avi; .ivf; .div; .divx; .mpg; .mpeg; .mpe; .mp4; .m4v; .wmv; .asf; .webm; .mkv; .mov; .qt; .ts; .mts; .m2t; .m2ts; .mod; .tod; .vro; .dat; .3gp2; .3gpp; .3gp; .3g2; .dvr-ms; .flv; .f4v; .amv; .rm; .rmm; .rv; .rmvb; .ogv; DVD video Output: .mp4; .wmv; .avi; .mkv; .webm; .flv; .swf; .mov; .3gp; .m2ts; DVD video === Audio formats === Input: .mp3 .wav; .aac; .m4a; .m4b; .wma; .ogg; .flac; .ra; .ram; .amr; .ape; .mka; .tta; .aiff; .au; .mpc; .spx; .ac3; audio cd Output: .mp3; .m4a; .aac; .wav; .wma; .ogg; .flac; .ape; audio CD === Image formats === Input: .jpg, .png, .bmp, .gif, .tga Output: .jpg, .png, .bmp, .gif, .tga, .pdf == Reception == The programs have been tested and endorsed by Chip Online, Tucows, SnapFiles, Brothersoft, and Softonic and have won awards from these sites. Free Studio is most popular in Germany, United States and Italy. It is also popular in Japan, France and the United Kingdom. The most popular applications, according to CNET statistics, include Free YouTube to MP3 Converter, Free Video to MP3 Converter, Free MP4 Video Converter and Free YouTube Download. Other programs with high rank: Free AVI Video Converter, Free Video Editor, Free Audio Converter and Free Studio in a whole. == Criticism == Free Studio (as can be common for freeware packages) is criticized for toolbar and Web search engine installation. Older versions have also included OpenCandy, which is loaded automatically, with no request for user approval. There can be difficulties installing only the programs needed without installing bundled extra programs. In March 2017, DVDVideoSoft announced that it had stopped showing other products' ads during installation and removed all toolbars, search engines, and OpenCandy.

    Read more →
  • Instant messaging

    Instant messaging

    Instant messaging (IM) technology is a type of synchronous computer-mediated communication involving the immediate (real-time) transmission of messages between two or more parties over the Internet or another computer network. Originally involving simple text message exchanges, modern instant messaging applications and services (also variously known as instant messenger, messaging app, chat app, chat client, or simply a messenger) tend to also feature the exchange of multimedia, emojis, file transfer, VoIP (voice calling), and video chat capabilities. Instant messaging systems facilitate connections between specified known users (often using a contact list also known as a "buddy list" or "friend list") or in chat rooms, and can be standalone apps or integrated into a wider social media platform, or in a website where it can, for instance, be used for conversational commerce. Originally the term "instant messaging" was distinguished from "text messaging" by being run on a computer network instead of a cellular/mobile network, being able to write longer messages, real-time communication, presence ("status"), and being free (only cost of access instead of per SMS message sent). Instant messaging was pioneered in the early Internet era; the IRC protocol was the earliest to achieve wide adoption. Later in the 1990s, ICQ was among the first closed and commercialized instant messengers, and several rival services appeared afterwards as it became a popular use of the Internet. Beginning with its first introduction in 2005, BlackBerry Messenger became the first popular example of mobile-based IM, combining features of traditional IM and mobile SMS. Instant messaging remains very popular today; IM apps are the most widely used smartphone apps: in 2018 for instance there were 980 million monthly active users of WeChat and 1.3 billion monthly users of WhatsApp, the largest IM network. == Overview == Instant messaging (IM), sometimes also called "messaging" or "texting", consists of computer-based human communication between two users (private messaging) or more (chat room or "group") in real-time, allowing immediate receipt of acknowledgment or reply. This is in direct contrast to email, where conversations are not in real-time, and the perceived quasi-synchrony of the communications by the users (although many systems allow users to send offline messages that the other user receives when logging in). Earlier IM networks were limited to text-based communication, not dissimilar to mobile text messaging. As technology has moved forward, IM has expanded to include voice calling using a microphone, videotelephony using webcams, file transfer, location sharing, image and video transfer, voice notes, and other features. IM is conducted over the Internet or other types of networks (see also LAN messenger). Depending on the IM protocol, the technical architecture can be peer-to-peer (direct point-to-point transmission) or client–server (when all clients have to first connect to the central server). Primary IM services are controlled by their corresponding companies and usually follow the client-server model. At one point, the term "Instant Messenger" was a service mark of AOL Time Warner and could not be used in software not affiliated with AOL in the United States. For this reason, in April 2007, the instant messaging client formerly named Gaim (or gaim) announced that they would be renamed "Pidgin". === Clients === Modern IM services generally provide their own client, either a separately installed application or a browser-based client. They are normally centralised networks run by the servers of the platform's operators, unlike peer-to-peer protocols like XMPP. These usually only work within the same IM network, although some allow limited function with other services (see #Interoperability). Third-party client software applications exist that will connect with most of the major IM services. There is the class of instant messengers that uses the serverless model, which doesn't require servers, and the IM network consists only of clients. There are several serverless messengers: RetroShare, Tox, Bitmessage, Ricochet. See also: LAN messenger. Some examples of popular IM services today include Signal, Telegram, WhatsApp Messenger, WeChat, QQ Messenger, Viber, Line, and Snapchat. The popularity of certain apps greatly differ between different countries. Certain apps have an emphasis on certain uses - for example, Skype focuses on video calling, Slack focuses on messaging and file sharing for work teams, and Snapchat focuses on image messages. Some social networking services offer messaging services as a component of their overall platform, such as Facebook's Facebook Messenger, who also own WhatsApp. Others have a direct IM function as an additional adjunct component of their social networking platforms, like Instagram, Reddit, Tumblr, TikTok, Clubhouse and Twitter; this also includes for example dating websites, such as OkCupid or Plenty of Fish, and online gaming chat platforms. === Features === ==== Private and group messaging ==== Private chat allows users to converse privately with another person or a group. Privacy can also be enhanced in several ways, such as end-to-end encryption by default. Public and group chat features allow users to communicate with multiple people simultaneously. ==== Calling ==== Many major IM services and applications offer a call feature for user-to-user voice calls, conference calls, and voice messages. The call functionality is useful for professionals who utilize the application for work purposes and as a hands-free method. Videotelephony using a webcam is also possible by some. ==== Games and entertainment ==== Some IM applications include in-app games for entertainment. Yahoo! Messenger, for example, introduced these where users could play a game and viewed by friends in real-time. MSN Messenger featured a number of playable games within the interface. Facebook's Messenger has had a built-in option to play games with people in a chat, including games like Tetris and Blackjack. Discord features multiple games built inside the "activities" tab in voice channels. ==== Payments ==== A relatively new feature to instant messaging, peer-to-peer payments are available for financial tasks on top of communication. The lack of a service fee also makes these advantageous to financial applications. IM services such as Facebook Messenger and the WeChat 'super-app' for example offer a payment feature. == History == === Early systems === Though the term dates from the 1990s, instant messaging predates the Internet, first appearing on multi-user operating systems like Compatible Time-Sharing System (CTSS) and Multiplexed Information and Computing Service (Multics) in the mid-1960s. Initially, some of these systems were used as notification systems for services like printing, but quickly were used to facilitate communication with other users logged into the same machine. CTSS facilitated communication via text message for up to 30 people. Parallel to instant messaging were early online chat facilities, the earliest of which was Talkomatic (1973) on the PLATO system, which allowed 5 people to chat simultaneously on a 512 x 512 plasma display (5 lines of text + 1 status line per person). During the bulletin board system (BBS) phenomenon that peaked during the 1980s, some systems incorporated chat features which were similar to instant messaging; Freelancin' Roundtable was one prime example. The first such general-availability commercial online chat service (as opposed to PLATO, which was educational) was the CompuServe CB Simulator in 1980, created by CompuServe executive Alexander "Sandy" Trevor in Columbus, Ohio. As networks developed, the protocols spread with the networks. Some of these used a peer-to-peer protocol (e.g. talk, ntalk and ytalk), while others required peers to connect to a server (see talker and IRC). The Zephyr Notification Service (still in use at some institutions) was invented at MIT's Project Athena in the 1980s to allow service providers to locate and send messages to users. Early instant messaging programs were primarily real-time text, where characters appeared as they were typed. This includes the Unix "talk" command line program, which was popular in the 1980s and early 1990s. Some BBS chat programs (i.e. Celerity BBS) also used a similar interface. Modern implementations of real-time text also exist in instant messengers, such as AOL's Real-Time IM as an optional feature. In the latter half of the 1980s and into the early 1990s, the Quantum Link online service for Commodore 64 computers offered user-to-user messages between concurrently connected customers, which they called "On-Line Messages" (or OLM for short), and later "FlashMail." Quantum Link later became America Online and made AOL Instant Messenger (AIM, discussed later). While the Quantum Link client software ran on a Commodore 64, using only

    Read more →
  • Backdoor (computing)

    Backdoor (computing)

    A backdoor is a typically covert method of bypassing normal authentication or encryption in a computer, product, embedded device (e.g. a home router), or its embodiment (e.g. part of a cryptosystem, algorithm, chipset, or even a "homunculus computer"—a tiny computer-within-a-computer such as that found in Intel's AMT technology). Backdoors are most often used for securing remote access to a computer, or obtaining access to plaintext in cryptosystems. From there it may be used to gain access to privileged information like passwords, corrupt or delete data on hard drives, or transfer information within compromised networks. In the United States, the 1994 Communications Assistance for Law Enforcement Act forces internet providers to provide backdoors for government authorities. In 2024, the U.S. government realized that China had been tapping communications in the U.S. using that infrastructure for months, or perhaps longer; China recorded presidential candidate campaign office phone calls—including employees of the then-vice president of the nation, and of the candidates themselves. A backdoor may take the form of a hidden part of a program, a separate program (e.g. Back Orifice may subvert the system through a rootkit), code in the firmware of the hardware, or parts of an operating system such as Windows, for example, device drivers. Trojan horses can be used to create vulnerabilities in a device. A Trojan horse may appear to be an entirely legitimate program, but when executed, it triggers an activity that may install a backdoor. Although some are secretly installed, other backdoors are deliberate and widely known. These kinds of backdoors have "legitimate" uses such as providing the manufacturer with a way to restore user passwords. Many systems that store information within the cloud fail to create accurate security measures. If many systems are connected within the cloud, hackers can gain access to all other platforms through the most vulnerable system. Default passwords (or other default credentials) can function as backdoors if they are not changed by the user. Some debugging features can also act as backdoors if they are not removed in the release version. In 1993, the United States government attempted to deploy an encryption system, the Clipper chip, with an explicit backdoor for law enforcement and national security access. The chip was unsuccessful. Recent proposals to counter backdoors include creating a database of backdoors' triggers and then using neural networks to detect them. == Overview == The threat of backdoors surfaced when multiuser and networked operating systems became widely adopted. Petersen and Turn discussed computer subversion in a paper published in the proceedings of the 1967 AFIPS Conference. They noted a class of active infiltration attacks that use "trapdoor" entry points into the system to bypass security facilities and permit direct access to data. The use of the word trapdoor here clearly coincides with more recent definitions of a backdoor. However, since the advent of public key cryptography the term trapdoor has acquired a different meaning (see: Trapdoor function), and thus the term "backdoor" is now preferred, only after the term trapdoor went out of use. More generally, such security breaches were discussed at length in a RAND Corporation task force report published under DARPA sponsorship by J.P. Anderson and D.J. Edwards in 1970. While initially targeting the computer vision domain, backdoor attacks have expanded to encompass various other domains, including text, audio, ML-based computer-aided design, and ML-based wireless signal classification. Additionally, vulnerabilities in backdoors have been demonstrated in deep generative models, reinforcement learning (e.g., AI GO), and deep graph models. These broad-ranging potential risks have prompted concerns from national security agencies regarding their potentially disastrous consequences. A backdoor in a login system might take the form of a hard coded user and password combination which gives access to the system. An example of this sort of backdoor was used as a plot device in the 1983 film WarGames, in which the architect of the "WOPR" computer system had inserted a hardcoded password-less account which gave the user access to the system, and to undocumented parts of the system (in particular, a video game-like simulation mode and direct interaction with the artificial intelligence). Although the number of backdoors in systems using proprietary software (software whose source code is not publicly available) is not widely credited, they are nevertheless frequently exposed. Programmers have even succeeded in secretly installing large amounts of benign code as Easter eggs in programs, although such cases may involve official forbearance, if not actual permission. == Examples == === Worms === Many computer worms, such as Sobig and Mydoom, install a backdoor on the affected computer (generally a PC on broadband running Microsoft Windows and Microsoft Outlook). Such backdoors appear to be installed so that spammers can send junk e-mail from the infected machines. Others, such as the Sony/BMG rootkit, placed secretly on millions of music CDs through late 2005, are intended as DRM measures—and, in that case, as data-gathering agents, since both surreptitious programs they installed routinely contacted central servers. A sophisticated attempt to plant a backdoor in the Linux kernel, exposed in November 2003, added a small and subtle code change by subverting the revision control system. In this case, a two-line change appeared to check root access permissions of a caller to the sys_wait4 function, but because it used assignment = instead of equality checking ==, it actually granted permissions to the system. This difference is easily overlooked, and could even be interpreted as an accidental typographical error, rather than an intentional attack. In January 2014, a backdoor was discovered in certain Samsung Android products, like the Galaxy devices. The Samsung proprietary Android versions are fitted with a backdoor that provides remote access to the data stored on the device. In particular, the Samsung Android software that is in charge of handling the communications with the modem, using the Samsung IPC protocol, implements a class of requests known as remote file server (RFS) commands, that allows the backdoor operator to perform via modem remote I/O operations on the device hard disk or other storage. As the modem is running Samsung proprietary Android software, it is likely that it offers over-the-air remote control that could then be used to issue the RFS commands and thus to access the file system on the device. === Object code backdoors === Harder to detect backdoors involve modifying object code, rather than source code—object code is much harder to inspect, as it is designed to be machine-readable, not human-readable. These backdoors can be inserted either directly in the on-disk object code, or inserted at some point during compilation, assembly linking, or loading—in the latter case the backdoor never appears on disk, only in memory. Object code backdoors are difficult to detect by inspection of the object code, but are easily detected by simply checking for changes (differences), notably in length or in checksum, and in some cases can be detected or analyzed by disassembling the object code. Further, object code backdoors can be removed (assuming source code is available) by simply recompiling from source on a trusted system. Thus for such backdoors to avoid detection, all extant copies of a binary must be subverted, and any validation checksums must also be compromised, and source must be unavailable, to prevent recompilation. Alternatively, these other tools (length checks, diff, checksumming, disassemblers) can themselves be compromised to conceal the backdoor, for example detecting that the subverted binary is being checksummed and returning the expected value, not the actual value. To conceal these further subversions, the tools must also conceal the changes in themselves—for example, a subverted checksummer must also detect if it is checksumming itself (or other subverted tools) and return false values. This leads to extensive changes in the system and tools being needed to conceal a single change. As object code can be regenerated by recompiling (reassembling, relinking) the original source code, making a persistent object code backdoor (without modifying source code) requires subverting the compiler itself—so that when it detects that it is compiling the program under attack it inserts the backdoor—or alternatively the assembler, linker, or loader. As this requires subverting the compiler, this in turn can be fixed by recompiling the compiler, removing the backdoor insertion code. This defense can in turn be subverted by putting a source meta-backdoor in the compiler, so that when it detects that it is compiling itself

    Read more →
  • Weird SoundCloud

    Weird SoundCloud

    Weird SoundCloud, or SoundClown, is a mashup parody music scene taking place on the online distribution platform SoundCloud. The scene has been described by its producers and music journalists to be a satirical take on electronic dance music, and useless, throwaway internet content. One critic, Audra Schroeder, categorized it as an in-joke that is "deconstructing and reshaping memes and popular music, recontextualizing the sacred texts of millennial chat rooms." == Origins == In a January 2014 interview, DJ Kevin Wang suggested that the Weird SoundCloud has "been around in the last one to two years", but started to gain much more popularity the previous year through electronic dance music internet blogs. Weird SoundCloud producer Ideaot suggested that some in the phenomenon came from the YouTube poop scene. Another producer in the community, DJ @@ (AT-AT), reasoned that producers joining the scene "want to express their musicality, see it as a more mature form of YouTube Poop," or are "just looking for recognition on social media sites." AT-AT said that it was "a fun thing to do, and after I stopped making proper music I felt I needed a bit of an outlet for my creativity. The fact that people enjoyed it and/or treated it as a travesty (Direct quote from one of my tracks) spurs me on." == Characteristics == Weird SoundCloud is a mash-up and parody music genre labeled by journalist Audra Schroeder as an in-joke that is "deconstructing and reshaping memes and popular music, recontextualizing the sacred texts of millennial chat rooms." Most tracks range from around 30 seconds to one minute in length. The people who make weird SoundCloud are known as SoundClowns, a term coined by producer Dicksoak. Ideaot described the weird SoundCloud community as "largely just people who are friends with each other." Noisey critic Ryan Bassil spotlight the variety of music coming out of the weird SoundCloud landscape: "One minute you could be listening to the Seinfeld theme reimagined as an aneurysm inducing dubstep corker, the next, you're recovering from hearing a version of Tenacious D's "Tribute" that's akin to having a stroke." Bassil analyzes that the tracks "often take the past and repurpose it into something that, although not altogether useful, sounds fresh and reflective of the abstract, confusing panoramic that encapsulates the modern internet." Bassil compared the lexicon of SoundClown's track titles to that of Reddit and Twitter users. According to Dicksoak, most works of the style are critiques of EDM or "are just uploaded because they sound funny." However, Bassil disagreed, writing that there are also many tracks that keep repurposing a certain meme, such as "mom's spaghetti" or the re-use of vocals from recordings by hip hop group Death Grips. He describe the scene's re-use of memes as a satirical take on pointless online content that is only on the internet to "do nothing other than fill the void": They're changing the format of the original work's intended message or audience - a technique often employed by top-tier digital media companies - and in doing so they're sarcastically, ironically, taking the piss out of what Web 2.0's turned into - an open arena where the most ridiculous, unashamed, often pointless piggy-back content can rack up thousands and thousands of clicks. == Notable examples == There are mash-ups that "disrupt the flow of popular music", in the words of writer Schroeder, such as a "flutedrop" remix of the Miley Cyrus song "Wrecking Ball" and Shaliek's mashup of music by Bruno Mars and Korn. In November 2013, Wang released a set of mp3 files on SoundCloud named Best Drops Ever, which included tracks like "A Drop So Epic a Bunch of NYU Bros Already Bought a 3-Day Weekend Pass for It" and "A Drop So Crazy You'll Kill Your Family". All of the tracks start as normal electronic dance music build-ups, before they drop into a "bait and switch" audio or film clip such as Filet-O-Fish commercials, the Whitney Houston song "I Will Always Love You" and the film Bambi (1942) that ruins the anticipation. The collection is a parody of the over-importance and over-focus of the drop and lack of care of the overall quality of a song common in the modern electronic dance music scene. Wang has released more than 45 tracks in the weird SoundCloud, some of them receiving around a million plays. Subgenres of Weird SoundCloud include Macklecore, mash-ups and remixes that include the works of American hip-hop recording artist Macklemore, and Biggiewave, which include samples of songs from the album Ready to Die (1994) by The Notorious B.I.G. Common audio and meme sources used include Skrillex, the Martin Garrix track "Animals", Thomas the Tank Engine, Shrek, Macklemore, "Gangnam Style", the Bruno Mars track "Uptown Funk", the Disturbed track "Down with the Sickness", Space Jam, the Childish Gambino track "Bonfire", the Death Grips track "Takyon" and air horn sound effects. == Reception == Bassil praised the SoundClown scene as "loveable and strangely honest", reasoning that it "just reminds me that we're all humans on the internet, all searching for #content that means something, something to connect with, but usually only dredging up bastardised versions of things we've already read, seen, or watched before." Bassil also described the weird SoundCloud as a more successful version of a similar scene known as weird YouTube; the reason for the success of SoundClowns is due to SoundCloud's discovery algorithm: "Small collectives and trends are able to form, and there's an abundance of tracks from artists who are almost forging careers out of it, as opposed to uploading one viral hit." Publications have made lists of weird SoundCloud works, such as BuzzFeed's "23 Of The Weirdest Songs On Soundcloud", Obsev's "Weird SoundCloud Mashups That Must've Been Made While Drunk", and Thump's "9 of the Best and Most Upsetting Soundclowns we Could Find", where writer Isabelle Hellyer called it the "most influential genre of music in human history." A Your EDM writer called it "oddly addicting."

    Read more →
  • Stability (learning theory)

    Stability (learning theory)

    Stability, also known as algorithmic stability, is a notion in computational learning theory of how a machine learning algorithm output is changed with small perturbations to its inputs. A stable learning algorithm is one for which the prediction does not change much when the training data is modified slightly. For instance, consider a machine learning algorithm that is being trained to recognize handwritten letters of the alphabet, using 1000 examples of handwritten letters and their labels ("A" to "Z") as a training set. One way to modify this training set is to leave out an example, so that only 999 examples of handwritten letters and their labels are available. A stable learning algorithm would produce a similar classifier with both the 1000-element and 999-element training sets. Stability can be studied for many types of learning problems, from language learning to inverse problems in physics and engineering, as it is a property of the learning process rather than the type of information being learned. The study of stability gained importance in computational learning theory in the 2000s when it was shown to have a connection with generalization. It was shown that for large classes of learning algorithms, notably empirical risk minimization algorithms, certain types of stability ensure good generalization. == History == A central goal in designing a machine learning system is to guarantee that the learning algorithm will generalize, or perform accurately on new examples after being trained on a finite number of them. In the 1990s, milestones were reached in obtaining generalization bounds for supervised learning algorithms. The technique historically used to prove generalization was to show that an algorithm was consistent, using the uniform convergence properties of empirical quantities to their means. This technique was used to obtain generalization bounds for the large class of empirical risk minimization (ERM) algorithms. An ERM algorithm is one that selects a solution from a hypothesis space H {\displaystyle H} in such a way to minimize the empirical error on a training set S {\displaystyle S} . A general result, proved by Vladimir Vapnik for an ERM binary classification algorithms, is that for any target function and input distribution, any hypothesis space H {\displaystyle H} with VC-dimension d {\displaystyle d} , and n {\displaystyle n} training examples, the algorithm is consistent and will produce a training error that is at most O ( d n ) {\displaystyle O\left({\sqrt {\frac {d}{n}}}\right)} (plus logarithmic factors) from the true error. The result was later extended to almost-ERM algorithms with function classes that do not have unique minimizers. Vapnik's work, using what became known as VC theory, established a relationship between generalization of a learning algorithm and properties of the hypothesis space H {\displaystyle H} of functions being learned. However, these results could not be applied to algorithms with hypothesis spaces of unbounded VC-dimension. Put another way, these results could not be applied when the information being learned had a complexity that was too large to measure. Some of the simplest machine learning algorithms—for instance, for regression—have hypothesis spaces with unbounded VC-dimension. Another example is language learning algorithms that can produce sentences of arbitrary length. Stability analysis was developed in the 2000s for computational learning theory and is an alternative method for obtaining generalization bounds. The stability of an algorithm is a property of the learning process, rather than a direct property of the hypothesis space H {\displaystyle H} , and it can be assessed in algorithms that have hypothesis spaces with unbounded or undefined VC-dimension such as nearest neighbor. A stable learning algorithm is one for which the learned function does not change much when the training set is slightly modified, for instance by leaving out an example. A measure of Leave one out error is used in a Cross Validation Leave One Out (CVloo) algorithm to evaluate a learning algorithm's stability with respect to the loss function. As such, stability analysis is the application of sensitivity analysis to machine learning. == Summary of classic results == Early 1900s - Stability in learning theory was earliest described in terms of continuity of the learning map L {\displaystyle L} , traced to Andrey Nikolayevich Tikhonov. 1979 - Devroye and Wagner observed that the leave-one-out behavior of an algorithm is related to its sensitivity to small changes in the sample. 1999 - Kearns and Ron discovered a connection between finite VC-dimension and stability. 2002 - In a landmark paper, Bousquet and Elisseeff proposed the notion of uniform hypothesis stability of a learning algorithm and showed that it implies low generalization error. Uniform hypothesis stability, however, is a strong condition that does not apply to large classes of algorithms, including ERM algorithms with a hypothesis space of only two functions. 2002 - Kutin and Niyogi extended Bousquet and Elisseeff's results by providing generalization bounds for several weaker forms of stability which they called almost-everywhere stability. Furthermore, they took an initial step in establishing the relationship between stability and consistency in ERM algorithms in the Probably Approximately Correct (PAC) setting. 2004 - Poggio et al. proved a general relationship between stability and ERM consistency. They proposed a statistical form of leave-one-out-stability which they called CVEEEloo stability, and showed that it is a) sufficient for generalization in bounded loss classes, and b) necessary and sufficient for consistency (and thus generalization) of ERM algorithms for certain loss functions such as the square loss, the absolute value and the binary classification loss. 2010 - Shalev Shwartz et al. noticed problems with the original results of Vapnik due to the complex relations between hypothesis space and loss class. They discuss stability notions that capture different loss classes and different types of learning, supervised and unsupervised. 2016 - Moritz Hardt et al. proved stability of gradient descent given certain assumption on the hypothesis and number of times each instance is used to update the model. == Preliminary definitions == We define several terms related to learning algorithms training sets, so that we can then define stability in multiple ways and present theorems from the field. A machine learning algorithm, also known as a learning map L {\displaystyle L} , maps a training data set, which is a set of labeled examples ( x , y ) {\displaystyle (x,y)} , onto a function f {\displaystyle f} from X {\displaystyle X} to Y {\displaystyle Y} , where X {\displaystyle X} and Y {\displaystyle Y} are in the same space of the training examples. The functions f {\displaystyle f} are selected from a hypothesis space of functions called H {\displaystyle H} . The training set from which an algorithm learns is defined as S = { z 1 = ( x 1 , y 1 ) , . . , z m = ( x m , y m ) } {\displaystyle S=\{z_{1}=(x_{1},\ y_{1})\ ,..,\ z_{m}=(x_{m},\ y_{m})\}} and is of size m {\displaystyle m} in Z = X × Y {\displaystyle Z=X\times Y} drawn i.i.d. from an unknown distribution D. Thus, the learning map L {\displaystyle L} is defined as a mapping from Z m {\displaystyle Z_{m}} into H {\displaystyle H} , mapping a training set S {\displaystyle S} onto a function f S {\displaystyle f_{S}} from X {\displaystyle X} to Y {\displaystyle Y} . Here, we consider only deterministic algorithms where L {\displaystyle L} is symmetric with respect to S {\displaystyle S} , i.e. it does not depend on the order of the elements in the training set. Furthermore, we assume that all functions are measurable and all sets are countable. The loss V {\displaystyle V} of a hypothesis f {\displaystyle f} with respect to an example z = ( x , y ) {\displaystyle z=(x,y)} is then defined as V ( f , z ) = V ( f ( x ) , y ) {\displaystyle V(f,z)=V(f(x),y)} . The empirical error of f {\displaystyle f} is I S [ f ] = 1 n ∑ V ( f , z i ) {\displaystyle I_{S}[f]={\frac {1}{n}}\sum V(f,z_{i})} . The true error of f {\displaystyle f} is I [ f ] = E z V ( f , z ) {\displaystyle I[f]=\mathbb {E} _{z}V(f,z)} Given a training set S of size m, we will build, for all i = 1....,m, modified training sets as follows: By removing the i-th element S | i = { z 1 , . . . , z i − 1 , z i + 1 , . . . , z m } {\displaystyle S^{|i}=\{z_{1},...,\ z_{i-1},\ z_{i+1},...,\ z_{m}\}} By replacing the i-th element S i = { z 1 , . . . , z i − 1 , z i ′ , z i + 1 , . . . , z m } {\displaystyle S^{i}=\{z_{1},...,\ z_{i-1},\ z_{i}',\ z_{i+1},...,\ z_{m}\}} == Definitions of stability == === Hypothesis Stability === An algorithm L {\displaystyle L} has hypothesis stability β with respect to the loss function V if the following holds: ∀ i ∈ { 1 , . . . , m } , E S , z [ | V ( f S , z ) − V ( f S |

    Read more →
  • Data steward

    Data steward

    A data steward is an oversight or data governance role within an organization, and is responsible for ensuring the quality and fitness for purpose of the organization's data assets, including the metadata for those data assets. A data steward may share some responsibilities with a data custodian, such as the awareness, accessibility, release, appropriate use, security and management of data. A data steward would also participate in the development and implementation of data assets. A data steward may seek to improve the quality and fitness for purpose of other data assets their organization depends upon but is not responsible for. Data stewards have a specialist role that utilizes an organization's data governance processes, policies, guidelines and responsibilities for administering an organizations' entire data in compliance with policy and/or regulatory obligations (e.g., GDPR, HIPAA). The overall objective of a data steward is the data quality of the data assets, datasets, data records and data elements. This includes documenting metainformation for the data, such as definitions, related rules/governance, physical manifestation, and related data models (most of these properties being specific to an attribute/concept relationship), identifying owners/custodian's various responsibilities, relations insight pertaining to attribute quality, aiding with project requirement data facilitation and documentation of capture rules. Data stewards begin the stewarding process with the identification of the data assets and elements which they will steward, with the ultimate result being standards, controls and data entry. The steward works closely with business glossary standards analysts (for standards), with data architect/modelers (for standards), with DQ analysts (for controls) and with operations team members (good-quality data going in per business rules) while entering data. Data stewardship roles are common when organizations attempt to exchange data precisely and consistently between computer systems and to reuse data-related resources. Master data management often makes references to the need for data stewardship for its implementation to succeed. Data stewardship must have precise purpose, fit for purpose or fitness. == Data steward responsibilities == A data steward ensures that each assigned data element: Has clear and unambiguous data element definition Does not conflict with other data elements in the metadata registry (removes duplicates, overlap etc.) Has clear enumerated value definitions if it is of type Code Is still being used (remove unused data elements) Is being used consistently in various computer systems Is being used, fit for purpose = Data Fitness Has adequate documentation on appropriate usage and notes Documents the origin and sources of authority on each metadata element Is protected against unauthorised access or change Responsibilities of data stewards vary between different organisations and institutions. For example, at Delft University of Technology, data stewards are perceived as the first contact point for any questions related to research data. They also have subject-specific background allowing them to easily connect with researchers and to contextualise data management problems to take into account disciplinary practices. == Types of data stewards == Depending on the set of data stewardship responsibilities assigned to an individual, there are 4 types (or dimensions of responsibility) of data stewards typically found within an organization: Data object data steward - responsible for managing reference data and attributes of one business data entity Business data steward - responsible for managing critical data, both reference and transactional, created or used by one business function. The data steward may also serve as a liaison between the organization's data users and technical teams, helping to bridge the gap between business needs and technical requirements. They may also play a role in educating others within the organization about best practices for data management, and advocating for data-driven decision-making. Process data steward - responsible for managing data across one business process System data steward - responsible for managing data for at least one IT system == Benefits of data stewardship == Systematic data stewardship can foster: Faster analysis Consistent use of data management resources Easy mapping of data between computer systems and exchange documents Lower costs associated with migration to (for example) service-oriented architecture (SOA) Mitigation of data risk Better control of dangers associated with privacy, legal, errors, etc. Assignment of each data element to a person sometimes seems like an unimportant process. But multiple groups have found that users have greater trust and usage rates in systems where they can contact a person with questions on each data element. == Examples == Delft University of Technology (TU Delft) offers an example of data stewardship implementation at a research institution. In 2017 the Data Stewardship Project was initiated at TU Delft to address research data management needs in a disciplinary manner across the whole campus. Dedicated data stewards with subject-specific background were appointed at every TU Delft faculty to support researchers with data management questions and to act as a linking point with the other institutional support services. The project is coordinated centrally by TU Delft Library, and it has its own website, blog and a YouTube channel. The [1]EPA metadata registry furnishes an example of data stewardship. Note that each data element therein has a "POC" (point of contact). In 2023, ETH Zurich launched the Data Stewardship Network (DSN) to facilitate collaboration among employees engaged in data management, analysis, and code development across research groups. The DSN serves as a platform for networking and knowledge exchange, aiming to professionalize the role of data stewards who support research data management and reproducible workflows. Established by the team for Research Data Management and Digital Curation at the ETH Library, the DSN collaborates with Scientific IT Services to provide expertise in areas such as storage infrastructure and reproducible workflows. == Data stewardship applications == Information stewardship applications are business solutions used by business users acting in the role of information steward (interpreting and enforcing information governance policy, for example). These developing solutions represent, for the most part, an amalgam of a number of disparate, previously IT-centric tools already on the market, but are organized and presented in such a way that information stewards (a business role) can support the work of information policy enforcement as part of their normal, business-centric, day-to-day work in a range of use cases. The initial push for the formation of this new category of packaged software came from operational use cases — that is, use of business data in and between transactional and operational business applications. This is where most of the master data management efforts are undertaken in organizations. However, there is also now a faster-growing interest in the new data lake arena for more analytical use cases.

    Read more →
  • Netsukuku

    Netsukuku

    Netsukuku is an experimental peer-to-peer routing system, developed by the FreakNet MediaLab in 2005, created to build up a distributed network, anonymous and censorship-free, fully independent but not necessarily separated from the Internet, without the support of any server, Internet service provider and no central authority. Netsukuku is designed to handle up to 2128 nodes without any servers or central systems, with minimal CPU and memory resources. This mesh network can be built using existing network infrastructure components such as Wi-Fi. The project has been in slow development since 2005, never abandoning a beta state. It has also never been tested on large scale. == Operation == As of December 2011, the latest theoretical work on Netsukuku could be found in the author's master thesis Scalable Mesh Networks and the Address Space Balancing problem. The following description takes into account only the basic concepts of the theory. Netsukuku uses a custom routing protocol called QSPN (Quantum Shortest Path Netsukuku) that strives to be efficient and not taxing on the computational capabilities of each node. The current version of the protocol is QSPNv2. It adopts a hierarchical structure. 256 nodes are grouped inside a gnode (group node), 256 gnodes are grouped in a single ggnode (group of group nodes), 256 ggnodes are grouped in a single gggnode, and so on. This offers a set of advantages main documentation. The protocol relies on the fact that the nodes are not mobile and that the network structure does not change quickly, as several minutes may be required before a change in the network is propagated. However, a node that joins the network is immediately able to communicate using the routes of its neighbors. When a node joins the mesh network, Netsukuku automatically adapts and all other nodes come to know the fastest and most efficient routes to communicate with the newcomer. Each node has no more privileges or restrictions than the other nodes. The domain name system (DNS) is replaced by a decentralised and distributed system called ANDNA (Abnormal Netsukuku Domain Name Anarchy). The ANDNA database is included in the Netsukuku system, so each node includes such database that occupies at most 355 kilobytes of memory. Simplifying, ANDNA works as follows: to resolve a symbolic name the host applies a function Hash on its behalf. The Hash function returns an address that the host contacts asking for the resolution generated by the hash. The contacted node receives a request, searches in its ANDNA database for the address associated with the name and returns it to the applicant host. Recording works in a similar way: for example, let's suppose that the node X wants to register the address FreakNet.andna; X calculates the hash name and obtains the address 11.22.33.44 associated with node Y. The node X contacts Y asking to register 11.22.33.44 as its own. Y stores the request in its database and any request for resolution of 11.22.33.44 hash, will answer with the X's address. The protocol is a little more complex than this, as the system provides a public/private key to authenticate the hosts and prevent unauthorized changes to the ANDNA database. Furthermore, the protocol provides redundancy in the database to make the protocol resistant to failure and also provides for the migration of the database if the network topology changes. The protocol does not provide for the possibility of revoking a symbolic name; after a certain period of inactivity (currently 3 days) it is simply deleted from the database. The protocol also prevents a single host from recording an excessive number of symbolic names (at present 256 names) in order to prevent spammers from storing a high number of terms to perform cybersquatting.

    Read more →