AI App Kya Hai In Hindi

AI App Kya Hai In Hindi — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • D4Science

    D4Science

    D4Science is a Data Infrastructure offering services by community-driven virtual research environments. In particular, it supports communities of practice willing to implement open science practices, thus it is an Open Science Infrastructure. The infrastructure follows the system of systems approach, where the constituent systems (Service providers) offer "resources" (namely services and by them data, computing, storage) assembled together to implement the overall set of D4Science services. In particular, D4Science aggregates "domain agnostic" service providers as well as community-specific ones to build a unifying space where the aggregated resources can be exploited via Virtual research Environments and their services. It is spread across several sites, the primary one is hosted by the Istituto di Scienza e Tecnologie dell'Informazione of National Research Council (Italy). At the earth of this infrastructure there is an Open Source Software named gCube system. == Services == D4Science offers: Virtual Research Environment as a Service providing any community of practice with a dedicated working environment supporting any knowledge production process in a collaborative way, in fact every VRE enables computer-supported cooperative work by design. D4Science-based VREs are web-based, community-oriented, collaborative, user-friendly, open-science-enabler working environments for scientists and practitioners willing to work together to perform a set of (research) task. From the end-user perspective, each VRE manifests in a unifying web application (and a set of application programming interfaces (APIs)): (a) comprising several applications organised in specific menu items and (b) running in a plain web browser. Every application is providing VRE users with facilities implemented by relying on one or more services provisioned by diverse providers. Among the basic services every VRE is equipped with there are a Social Networking area enabling collaborative and open discussions on any topic and disseminating information of interest for the community, for example, the availability of a research outcome; a Workspace for storing, organizing and sharing any version of a research artifact, including dataset and model implementation; a User Management dashboard for managing membership and roles; a Catalogue Service recording the assets worth being published thus to make it possible for others to be informed and make use of these assets. Science Gateway as a Service providing a community of practice with a dedicated science gateway hosting a selected set of virtual research environments. Data Analytics at scale for data analytics including: a proprietary data analytics platform (DataMiner) to execute analytics tasks either by relying on methods provided by the user or by others. It is endowed with importing and sharing facilities for analytics methods implemented in heterogeneous forms including R, Java, Python, and KNIME. The platform enacts tasks execution by a distributed and hybrid computing infrastructure. Moreover, one of the worth highlighting feature of this platform is its open science-friendliness. All the analytics methods integrated in it are exposed by a standard protocol (the OGC WPS protocol) clients can use to get informed on available methods as well as to start processes, monitor their execution and access results. Every analytics task performed by the platform automatically produces a provenance record catering for the reproducibility of the task; an RStudio-based development environment for R enabling to perform statistical computing tasks in the cloud. This RStudio environment is (i) preconfigured with libraries and packages to ease the execution of common data analytics tasks, and (ii) provides seamless access to the VRE Workspace enabling sharing of resources with other members of the same working environment. a Jupyter-based notebook environment for developing and executing interactive computing by JupyterLab instances. Each JupyterLab is (i) preconfigured with libraries and packages to ease the execution of common data analytics tasks, and (ii) provides access to the VRE Workspace enabling sharing of resources with other members of the same working environment. == Community == The D4Science Infrastructure serves more than 24,000 registered users (August 2024) through 177 active VREs offered via 20 Science gateways. This extensive infrastructure not only supports a diverse range of scientific communities but also fosters significant engagement and collaboration among researchers worldwide. Engagement within the D4Science community is robust, with users benefiting from user-friendly application environments tailored to their specific needs. The platform allows users to securely preserve, access, and share their data from anywhere, fostering a collaborative and inclusive research environment. Additionally, groups of users can create their own virtual environments and customise them with the applications they need, further enhancing the platform's flexibility and usability. Supported communities and cases range from Agri-food to Social Data Science, Earth Science and Marine Science. These diverse applications demonstrate the versatility and broad applicability of the D4Science Infrastructure, making it an invaluable resource for researchers across various scientific domains. == History == The D4Science development has been supported by several European-funded projects. DILIGENT (2004-2007) in the Sixth Framework Programme for Research and Technological Development was the forerunner where a testbed infrastructure built by integrating digital library and grid computing technologies and resources was conceived and developed to serve the needs of communities of practice involved in knowledge development. In the context of the Seventh Framework Programme for research, technological development and demonstration the development of the D4Science initiative. In this period the infrastructure was established and developed to serve communities of practices from domains ranging from Earth Science to Marine Science with worldwide scope In the context of the H2020 research and innovation programme the maturity level of the D4Science infrastructure was high enough to allow a large and very diverse set of communities of practice to benefit from it and its services and further contribute to its development. Moreover, the services offered by the infrastructure have been developed to support open science practices. The operation and improvement of the D4Science infrastructure facilities are still ongoing while its exploitation is progressively growing.

    Read more →
  • Data recovery

    Data recovery

    In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or overwritten data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS). Logical failures occur when the hard drive devices are functional but the user or automated-OS cannot retrieve or access data stored on them. Logical failures can occur due to corruption of the engineering chip, lost partitions, firmware failure, or failures during formatting/re-installation. Data recovery can be a very simple or technical challenge. This is why there are specific software companies specialized in this field that help to get back data on your system. == About == The most common data recovery scenarios involve an operating system failure, malfunction of a storage device, logical failure of storage devices, accidental damage or deletion, etc. (typically, on a single-drive, single-partition, single-OS system), in which case the ultimate goal is simply to copy all important files from the damaged media to another new drive. This can be accomplished using a Live CD, or DVD by booting directly from a ROM or a USB drive instead of the corrupted drive in question. Many Live CDs or DVDs provide a means to mount the system drive and backup drives or removable media, and to move the files from the system drive to the backup media with a file manager or optical disc authoring software. Such cases can often be mitigated by disk partitioning and consistently storing valuable data files (or copies of them) on a different partition from the replaceable OS system files. Another scenario involves a drive-level failure, such as a compromised file system or drive partition, or a hard disk drive failure. In any of these cases, the data is not easily read from the media devices. Depending on the situation, solutions involve repairing the logical file system, partition table, or master boot record, or updating the firmware or drive recovery techniques ranging from software-based recovery of corrupted data, to hardware- and software-based recovery of damaged service areas (also known as the hard disk drive's "firmware"), to hardware replacement on a physically damaged drive which allows for the extraction of data to a new drive. If a drive recovery is necessary, the drive itself has typically failed permanently, and the focus is rather on a one-time recovery, salvaging whatever data can be read. In a third scenario, files have been accidentally "deleted" from a storage medium by the users. Typically, the contents of deleted files are not removed immediately from the physical drive; instead, references to them in the directory structure are removed, and thereafter space the deleted data occupy is made available for later data overwriting. In the mind of end users, deleted files cannot be discoverable through a standard file manager, but the deleted data still technically exists on the physical drive. In the meantime, the original file contents remain, often several disconnected fragments, and may be recoverable if not overwritten by other data files. The term "data recovery" is also used in the context of forensic applications or espionage, where data which have been encrypted, hidden, or deleted, rather than damaged, are recovered. Sometimes data present in the computer gets encrypted or hidden due to reasons like virus attacks which can only be recovered by some computer forensic experts. == Physical damage == A wide variety of failures can cause physical damage to storage media, which may result from human errors and natural disasters. CD-ROMs can have their metallic substrate or dye layer scratched off; hard disks can suffer from a multitude of mechanical failures, such as head crashes, PCB failure, and failed motors; tapes can simply break. Physical damage to a hard drive, even in cases where a head crash has occurred, does not necessarily mean permanent data loss. However, in extreme cases, such as prolonged exposure to moisture and corrosion —like the lost Bitcoin hard drive of James Howells, buried in the Newport landfill for over a decade — recovery is usually impossible. In rare cases, forensic techniques such as magnetic force microscopy (MFM) have been explored to detect residual magnetic traces when data holds exceptional value. Other techniques employed by many professional data recovery companies can typically salvage most, if not all, of the data that had been lost when the failure occurred. Of course, there are exceptions to this, such as cases where severe damage to the hard drive platters may have occurred. However, if the hard drive can be repaired and a full image or clone created, then the logical file structure can be rebuilt in most instances. Most physical damage cannot be repaired by end users. For example, opening a hard disk drive in a normal environment can allow airborne dust to settle on the platter and become caught between the platter and the read/write head. During normal operation, read/write heads float 3 to 6 nanometers above the platter surface, and the average dust particles found in a normal environment are typically around 30,000 nanometers in diameter. When these dust particles get caught between the read/write heads and the platter, they can cause new head crashes that further damage the platter and thus compromise the recovery process. Furthermore, end users generally do not have the hardware or technical expertise required to make these repairs. Consequently, data recovery companies are often employed to salvage important data with the more reputable ones using class 100 dust- and static-free cleanrooms. === Recovery techniques === Recovering data from physically damaged hardware can involve multiple techniques. Some damage can be repaired by replacing parts in the hard disk. This alone may make the disk usable, but there may still be logical damage. A specialized disk-imaging procedure is used to recover every readable bit from the surface. Once this image is acquired and saved on a reliable medium, the image can be safely analyzed for logical damage and will possibly allow much of the original file system to be reconstructed. ==== Hardware repair ==== A common misconception is that a damaged printed circuit board (PCB) may be simply replaced during recovery procedures by an identical PCB from a healthy drive. While this may work in rare circumstances on hard disk drives manufactured before 2003, it will not work on newer drives. Electronics boards of modern drives usually contain drive-specific adaptation data (generally a map of bad sectors and tuning parameters) and other information required to properly access data on the drive. Replacement boards often need this information to effectively recover all of the data. The replacement board may need to be reprogrammed. Some manufacturers (Seagate, for example) store this information on a serial EEPROM chip, which can be removed and transferred to the replacement board. Each hard disk drive has what is called a system area or service area; this portion of the drive, which is not directly accessible to the end user, usually contains drive's firmware and adaptive data that helps the drive operate within normal parameters. One function of the system area is to log defective sectors within the drive; essentially telling the drive where it can and cannot write data. The sector lists are also stored on various chips attached to the PCB, and they are unique to each hard disk drive. If the data on the PCB do not match what is stored on the platter, then the drive will not calibrate properly. In most cases the drive heads will click because they are unable to find the data matching what is stored on the PCB. == Logical damage == The term "logical damage" refers to situations in which the error is not a problem in the hardware and requires software-level solutions. === Corrupt partitions and file systems, media errors === In some cases, data on a hard disk drive can be unreadable due to damage to the partition table or file system, or to (intermittent) media errors. In the majority of these cases, at least a portion of the original data can be recovered by repairing the damaged partition table or file system using specialized data recovery software such as TestDisk; software like ddrescue can image media despite intermittent errors, and image raw data when there is partition table or file system damage. This type of data recovery can be performed by people without expertise in drive hardware as it requires no special physica

    Read more →
  • Master/Session

    Master/Session

    In cryptography, Master/Session is a key management scheme in which a pre-shared Key Encrypting Key (called the "Master" key) is used to encrypt a randomly generated and insecurely communicated Working Key (called the "Session" key). The Working Key is then used for encrypting the data to be exchanged. Its advantage is simplicity, but it suffers the disadvantage of having to communicate the pre-shared Key Exchange Key, which can be difficult to update in the event of compromise. The Master/Session technique was created in the days before asymmetric techniques, such as Diffie-Hellman, were invented. This technique still finds widespread use in the financial industry, and is routinely used between corporate parties such as issuers, acquirers, switches. Its use in device communications (such as PIN pads), however, is in decline given the advantages of techniques such as DUKPT.

    Read more →
  • Data storage

    Data storage

    Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are considered by some as data storage. Recording may be accomplished with virtually any form of energy. Electronic data storage requires electrical power to store and retrieve data. Data stored in a digital, machine-readable medium is called digital data. Computer data storage is one of the core functions of a general-purpose computer. Electronic documents can be stored in much less space than paper documents. Barcodes and magnetic ink character recognition (MICR) are two ways of recording machine-readable data on paper. == Recording media == A recording medium is physical material that holds information. Newly created information is distributed and can be stored in four storage media–print, film, magnetic, and optical–and seen or heard in four information flows–telephone, radio, TV, and the Internet as well as being observed directly. Digital information is stored on electronic media in many different recording formats. With electronic media, the data and the recording media are sometimes referred to as "software" despite the more common use of the word to describe computer software. With (traditional art) static media, art materials such as crayons may be considered both equipment and medium as the wax, charcoal or chalk material from the equipment becomes part of the surface of the medium. Some recording media may be temporary, either by design or by nature. Volatile organic compounds may be used to purposely make data expire over time or to reduce environmental impact. Data such as smoke signals or skywriting are temporary by nature. Depending on the volatility, a gas (e.g., atmosphere, smoke) or a liquid surface such as a lake would be considered a temporary recording medium, if it could be considered a recording medium at all. == Global capacity, digitization, and trends == A 2003 UC Berkeley report estimated that about five exabytes of new information were produced in 2002 and that 92% of this data was stored on magnetic media (primarily hard disk drives). This was about twice the data produced in 1999. The amount of data transmitted over telecommunications systems in 2002 was nearly 18 exabytes—three and a half times more than was recorded on non-volatile storage. Telephone calls constituted 98% of the telecommunicated information in 2002. The researchers' highest estimate for the growth rate of newly stored information (uncompressed) was more than 30% per year. In a more limited study, the International Data Corporation estimated that the total amount of digital data in 2007 was 281 exabytes and that the total amount of digital data produced exceeded the global storage capacity for the first time. A 2011 article in Science estimated that the year 2002 was the beginning of the digital age for information storage: an age in which more information is stored on digital storage devices than on analog storage devices. In 1986, approximately 1% of the world's capacity to store information was in digital format; this grew to 3% by 1993, to 25% by 2000, and to 94% by 2007. These figures correspond to less than three compressed exabytes in 1986, and 295 compressed exabytes in 2007. The quantity of digital storage doubled roughly every three to four years. It is estimated that around 120 zettabytes of data will be generated in 2023, an increase of 60x from 2010, and that it will increase to 181 zettabytes generated in 2025. == Mass storage ==

    Read more →
  • Glossary of machine vision

    Glossary of machine vision

    The following are common definitions related to the machine vision field. General related fields Machine vision Computer vision Image processing Signal processing == 0-9 == 1394. FireWire is Apple Inc.'s brand name for the IEEE 1394 interface. It is also known as i.Link (Sony's name) or IEEE 1394 (although the 1394 standard also defines a backplane interface). It is a personal computer (and digital audio/digital video) serial bus interface standard, offering high-speed communications and isochronous real-time data services. 1D. One-dimensional. 2D computer graphics. The computer-based generation of digital images—mostly from two-dimensional models (such as 2D geometric models, text, and digital images) and by techniques specific to them. 3D computer graphics. 3D computer graphics are different from 2D computer graphics in that a three-dimensional representation of geometric data is stored in the computer for the purposes of performing calculations and rendering 2D images. Such images may be for later display or for real-time viewing. Despite these differences, 3D computer graphics rely on many of the same algorithms as 2D computer vector graphics in the wire frame model and 2D computer raster graphics in the final rendered display. In computer graphics software, the distinction between 2D and 3D is occasionally blurred; 2D applications may use 3D techniques to achieve effects such as lighting, and primarily 3D may use 2D rendering techniques. 3D scanner. This is a device that analyzes a real-world object or environment to collect data on its shape and possibly color. The collected data can then be used to construct digital, three dimensional models useful for a wide variety of applications. == A == Aberration. Optically, defocus refers to a translation along the optical axis away from the plane or surface of best focus. In general, defocus reduces the sharpness and contrast of the image. What should be sharp, high-contrast edges in a scene become gradual transitions. Algebraic distance or algebraic error. The algebraic distance from a point xi to a curve or surface defined by f ( x , β ) = 0 {\displaystyle f(x,\beta )=0} is the value of f ( x i , β ) {\displaystyle f(x_{i},\beta )} , i.e. the residual in the least squares problem with data point (xi, 0) and model function f. This term is mainly used in computer vision.[1][2] Aperture. In context of photography or machine vision, aperture refers to the diameter of the aperture stop of a photographic lens. The aperture stop can be adjusted to control the amount of light reaching the film or image sensor. aspect ratio (image). The aspect ratio of an image is its displayed width divided by its height (usually expressed as "x:y"). Angular resolution. Describes the resolving power of any image forming device such as an optical or radio telescope, a microscope, a camera, or an eye. Automated optical inspection. == B == Barcode. A barcode (also bar code) is a machine-readable representation of information in a visual format on a surface. Blob discovery. Inspecting an image for discrete blobs of connected pixels (e.g. a black hole in a grey object) as image landmarks. These blobs frequently represent optical targets for machining, robotic capture, or manufacturing failure. Bitmap. A raster graphics image, digital image, or bitmap, is a data file or structure representing a generally rectangular grid of pixels, or points of color, on a computer monitor, paper, or other display device. == C == Camera. A camera is a device used to take pictures, either singly or in sequence. A camera that takes pictures singly is sometimes called a photo camera to distinguish it from a video camera. Camera Link. Camera Link is a serial communication protocol designed for computer vision applications based on the National Semiconductor interface Channel-link. It was designed for the purpose of standardizing scientific and industrial video products including cameras, cables and frame grabbers. The standard is maintained and administered by the Automated Imaging Association, or AIA, the global machine vision industry's trade group. Charge-coupled device. A charge-coupled device (CCD) is a sensor for recording images, consisting of an integrated circuit containing an array of linked, or coupled, capacitors. CCD sensors and cameras tend to be more sensitive, less noisy, and more expensive than CMOS sensors and cameras. CIE 1931 Color Space. In the study of the perception of color, one of the first mathematically defined color spaces was the CIE XYZ color space (also known as CIE 1931 color space), created by the International Commission on Illumination (CIE) in 1931. CMOS. CMOS ("see-moss")stands for complementary metal-oxide semiconductor, is a major class of integrated circuits. CMOS imaging sensors for machine vision are cheaper than CCD sensors but more noisy. CoaXPress. CoaXPress (CXP) is an asymmetric high speed serial communication standard over coaxial cable. CoaXPress combines high speed image data, low speed camera control and power over a single coaxial cable. The standard is maintained by JIIA, the Japan Industrial Imaging Association. Color. The perception of the frequency (or wavelength) of light, and can be compared to how pitch (or a musical note) is the perception of the frequency or wavelength of sound. Color blindness. Also known as color vision deficiency, in humans is the inability to perceive differences between some or all colors that other people can distinguish Color temperature. "White light" is commonly described by its color temperature. A traditional incandescent light source's color temperature is determined by comparing its hue with a theoretical, heated black-body radiator. The lamp's color temperature is the temperature in kelvins at which the heated black-body radiator matches the hue of the lamp. Color vision. CV is the capacity of an organism or machine to distinguish objects based on the wavelengths (or frequencies) of the light they reflect or emit. computer vision. The study and application of methods which allow computers to "understand" image content. Contrast. In visual perception, contrast is the difference in visual properties that makes an object (or its representation in an image) distinguishable from other objects and the background. C-Mount. Standardized adapter for optical lenses on CCD - cameras. C-Mount lenses have a back focal distance 17.5 mm vs. 12.5 mm for "CS-mount" lenses. A C-Mount lens can be used on a CS-Mount camera through the use of a 5 mm extension adapter. C-mount is a 1" diameter, 32 threads per inch mounting thread (1"-32UN-2A.) CS-Mount. Same as C-Mount but the focal point is 5 mm shorter. A CS-Mount lens will not work on a C-Mount camera. CS-mount is a 1" diameter, 32 threads per inch mounting thread. == D == Data matrix. A two dimensional Barcode. Depth of field. In optics, particularly photography and machine vision, the depth of field (DOF) is the distance in front of and behind the subject which appears to be in focus. Depth perception. DP is the visual ability to perceive the world in three dimensions. It is a trait common to many higher animals. Depth perception allows the beholder to accurately gauge the distance to an object. Diaphragm. In optics, a diaphragm is a thin opaque structure with an opening (aperture) at its centre. The role of the diaphragm is to stop the passage of light, except for the light passing through the aperture. == E == Edge detection. ED marks the points in a digital image at which the luminous intensity changes sharply. It also marks the points of luminous intensity changes of an object or spatial-taxon silhouette. Electromagnetic interference. Radio Frequency Interference (RFI) is electromagnetic radiation which is emitted by electrical circuits carrying rapidly changing signals, as a by-product of their normal operation, and which causes unwanted signals (interference or noise) to be induced in other circuits. == F == FireWire. FireWire (also known as i. Link or IEEE 1394) is a personal computer (and digital audio/video) serial bus interface standard, offering high-speed communications. It is often used as an interface for industrial cameras. Fixed-pattern noise. Flat-field correction. Frame grabber. An electronic device that captures individual, digital still frames from an analog video signal or a digital video stream. Fringe Projection Technique. 3D data acquisition technique employing projector displaying fringe pattern on a surface of measured piece, and one or more cameras recording image(s). Field of view. The field of view (FOV) is the part which can be seen by the machine vision system at one moment. The field of view depends from the lens of the system and from the working distance between object and camera. Focus. An image, or image point or region, is said to be in focus if light from object points is converged about as well as possible in the image; conversely, it is out of focus if light is not w

    Read more →
  • Social bot

    Social bot

    A social bot, refers to fully or partially automated social media accounts designed to perform most regular users’ actions, such as liking, posting content, and chatting with other users. Although their levels of autonomy vary, and often include a human-in-the-loop, social bots can use artificial intelligence to perform social media actions and can use large language models to mimic human dialogue. Social bots can operate alone or in groups that coordinate messaging as part of a network of coordinated inauthentic behavior. Social bots are often used to perform ad fraud by artificially boosting viewership and engagement metrics and to spread disinformation on social media. == Uses == Social bots are used for a large number of purposes on a variety of social media platforms, including Twitter, Instagram, Facebook, and YouTube. One common use of social bots is to inflate a social media user's apparent popularity, usually by artificially manipulating their engagement metrics with large volumes of fake likes, reposts, or replies. Social bots can similarly be used to artificially inflate a user's follower count with fake followers, creating a false perception of a larger and more influential online following than is the case. The use of social bots to create the impression of a large social media influence allows individuals, brands, and organizations to attract a higher number of human followers and boost their online presence. Fake engagement can be bought and sold in the black market of social media engagement. Corporations typically use automated customer service agents on social media to affordably manage high levels of support requests. Social bots are used to send automated responses to users’ questions, sometimes prompting the user to private message the support account with additional information. The increased use of automated support bots and virtual assistants has led to some companies laying off customer-service staff. Social bots are also often used to influence public opinion. Autonomous bot accounts can flood social media with large numbers of posts expressing support for certain products, companies, or political campaigns, creating the impression of organic grassroots support. This can create a false perception of the number of people who support a certain position, which may also have effects on the direction of stock prices or on elections. Messages with similar content can also influence fads or trends. Many social bots are also used to amplify phishing attacks. These malicious bots are used to trick a social media user into giving up their passwords or other personal data. This is usually accomplished by posting links claiming to direct users to news articles that would in actuality direct to malicious websites containing malware. Scammers often use URL shortening services such as TinyURL and bit.ly to disguise a link's domain address, increasing the likelihood of a user clicking the malicious link. The presence of fake social media followers and high levels of engagement help convince the victim that the scammer is in fact a trusted user. Social bots can be a tool for computational propaganda. Bots can also be used for algorithmic curation, algorithmic radicalization, and/or influence-for-hire, a term that refers to the selling of an account on social media platforms. == History == Bots have coexisted with computer technology since the earliest days of computing. Social bots have their roots in the 1950s with Alan Turing, whose work focused on machine intelligence with the development of the Turing Test. The following decades saw further progress made towards the goal of creating programs capable of mimicking human behavior, notably with Joseph Weizenbaum’s creation of ELIZA. Considered to be one of the first Chatbots, ELIZA could simulate natural conversations with human users through pattern matching. Its most famous script was DOCTOR, a simulation of a Rogerian psychotherapist that was programmed to chat with patients and respond to questions. With the growth of social media platforms in the early 2000s, these bots could be used to interact with much larger user groups in an inconspicuous manner. Early instances of autonomous agents on social media could be found on sites like MySpace, with social bots being used by marketing firms to inflate activity on a user’s page in an effort to make them appear more popular. Social bots have been observed on a large variety of social media websites, with Twitter being one of the most widely observed examples. The creation of Twitter bots is generally against the site’s terms of service when used to post spam or to automatically like and follow other users, but some degree of automation using Twitter’s API may be permitted if used for “entertainment, informational, or novelty purposes.” Other platforms such as Reddit and Discord also allow for the use of social bots as long as they are not used to violate policies regarding harmful content and abusive behavior. Social media platforms have developed their own automated tools to filter out messages that come from bots, although they cannot detect all bot messages. == Legal regulation == Due to the difficulty of recognizing social bots and separating them from "eligible" automation via social media APIs, it is unclear how legal regulation can be enforced. Social bots are expected to play a role in shaping public opinion by autonomously acting as influencers. Some social bots have been used to rapidly spread misinformation, manipulate stock markets, influence opinion on companies and brands, promote political campaigns, and engage in malicious phishing campaigns. In the United States, some states have started to implement legislation in an attempt to regulate the use of social bots. In 2019, California passed the Bolstering Online Transparency Act (the B.O.T. Act) to make it unlawful to use automated software to appear indistinguishable from humans for the purpose of influencing a social media user's purchasing and voting decisions. Other states such as Utah and Colorado have passed similar bills to restrict the use of social bots. The Artificial Intelligence Act (AI Act) in the European Union is the first comprehensive law governing the use of Artificial Intelligence. The law requires transparency in AI to prevent users from being tricked into believing they are communicating with another human. AI-generated content on social media must be clearly marked as such, preventing social bots from using AI in a manner that mimics human behavior. == Detection == The first generation of bots could sometimes be distinguished from real users by their often superhuman capacities to post messages. Later developments have succeeded in imprinting more "human" activity and behavioral patterns in the agent. With enough bots, it might be even possible to achieve artificial social proof. To unambiguously detect social bots as what they are, a variety of criteria must be applied together using pattern detection techniques, some of which are: cartoon figures as user pictures sometimes also random real user pictures are captured (identity fraud) reposting rate temporal patterns sentiment expression followers-to-friends ratio length of user names variability in (re)posted messages engagement rate (like/followers rate) analysis of the time series of social media posts Social bots are always becoming increasingly difficult to detect and understand. The bots' human-like behavior, ever-changing behavior of the bots, and the sheer volume of bots covering every platform may have been a factor in the challenges of removing them. Social media sites, like Twitter, are among the most affected, with CNBC reporting up to 48 million of the 319 million users (roughly 15%) were bots in 2017. Botometer (formerly BotOrNot) is a public Web service that checks the activity of a Twitter account and gives it a score based on how likely the account is to be a bot. The system leverages over a thousand features. An active method for detecting early spam bots was to set up honeypot accounts that post nonsensical content, which may get reposted (retweeted) by the bots. However, bots evolve quickly, and detection methods have to be updated constantly, because otherwise they may get useless after a few years. One method is the use of Benford's Law for predicting the frequency distribution of significant leading digits to detect malicious bots online. This study was first introduced at the University of Pretoria in 2020. Another method is artificial-intelligence-driven detection. Some of the sub-categories of this type of detection would be active learning loop flow, feature engineering, unsupervised learning, supervised learning, and correlation discovery. Some operations of bots work together in a synchronized way. For example, ISIS used Twitter to amplify its Islamic content by numerous orchestrated accounts which further pushed an item to the Hot List news, thus further a

    Read more →
  • Secret London

    Secret London

    Secret London is a Facebook group started by 21-year-old Bristol University graduate, Tiffany Philippou, on 19 January 2010 in response to a Saatchi & Saatchi competition. The group grew rapidly (180,000 members as of 8 February 2010) and is composed mostly of Londoners who use the site to share suggestions and photos of London. After the group's early success, the founder announced her intention to launch a website of the same name by crowdsourcing the design and development. The website was launched on 16 February 2010. == Other secret cities == Following the initial success of Secret London, a number of other secret groups were independently started around the world, some of which already have over 100,000 users. As of 19 February 2010, the list of other groups includes: Secret Frankfurt, Secret Tel Aviv, Secret Paris, Secret New York, Secret Tokyo, Secret Toronto, Secret Los Angeles, Secret Exeter, Secret Boston, Secret Norwich, Secret Singapore, Secret Brighton, Secret Minneapolis, Secret Sydney, Secret Canberra, Secret Brisbane, Secret Wellington, Secret Christchurch, Secret Madeira, Secret Funchal, Secret Bristol and Secret Cardiff. == Controversy == Some commentators have questioned whether it possible to share secrets without compromising them, and whether sharing tips publicly will lead to over-exposure of the businesses who are recommended.

    Read more →
  • Data stream management system

    Data stream management system

    A data stream management system (DSMS) is a computer software system to manage continuous data streams. It is similar to a database management system (DBMS), which is, however, designed for static data in conventional databases. A DBMS also offers a flexible query processing so that the information needed can be expressed using queries. However, in contrast to a DBMS, a DSMS executes a continuous query that is not only performed once, but is permanently installed. Therefore, the query is continuously executed until it is explicitly uninstalled. Since most DSMS are data-driven, a continuous query produces new results as long as new data arrive at the system. This basic concept is similar to complex event processing so that both technologies are partially coalescing. == Functional principle == One important feature of a DSMS is the possibility to handle potentially infinite and rapidly changing data streams by offering flexible processing at the same time, although there are only limited resources such as main memory. The following table provides various principles of DSMS and compares them to traditional DBMS. == Processing and streaming models == One of the biggest challenges for a DSMS is to handle potentially infinite data streams using a fixed amount of memory and no random access to the data. There are different approaches to limit the amount of data in one pass, which can be divided into two classes. For the one hand, there are compression techniques that try to summarize the data and for the other hand there are window techniques that try to portion the data into (finite) parts. === Synopses === The idea behind compression techniques is to maintain only a synopsis of the data, but not all (raw) data points of the data stream. The algorithms range from selecting random data points called sampling to summarization using histograms, wavelets or sketching. One simple example of a compression is the continuous calculation of an average. Instead of memorizing each data point, the synopsis only holds the sum and the number of items. The average can be calculated by dividing the sum by the number. However, it should be mentioned that synopses cannot reflect the data accurately. Thus, a processing that is based on synopses may produce inaccurate results. === Windows === Instead of using synopses to compress the characteristics of the whole data streams, window techniques only look on a portion of the data. This approach is motivated by the idea that only the most recent data are relevant. Therefore, a window continuously cuts out a part of the data stream, e.g. the last ten data stream elements, and only considers these elements during the processing. There are different kinds of such windows like sliding windows that are similar to FIFO lists or tumbling windows that cut out disjoint parts. Furthermore, the windows can also be differentiated into element-based windows, e.g., to consider the last ten elements, or time-based windows, e.g., to consider the last ten seconds of data. There are also different approaches to implementing windows. There are, for example, approaches that use timestamps or time intervals for system-wide windows or buffer-based windows for each single processing step. Sliding-window query processing is also suitable to being implemented in parallel processors by exploiting parallelism between different windows and/or within each window extent. == Query processing == Since there are a lot of prototypes, there is no standardized architecture. However, most DSMS are based on the query processing in DBMS by using declarative languages to express queries, which are translated into a plan of operators. These plans can be optimized and executed. A query processing often consists of the following steps. === Formulation of continuous queries === The formulation of queries is mostly done using declarative languages like SQL in DBMS. Since there are no standardized query languages to express continuous queries, there are a lot of languages and variations. However, most of them are based on SQL, such as the Continuous Query Language (CQL), StreamSQL and ESP. There are also graphical approaches where each processing step is a box and the processing flow is expressed by arrows between the boxes. The language strongly depends on the processing model. For example, if windows are used for the processing, the definition of a window has to be expressed. In StreamSQL, a query with a sliding window for the last 10 elements looks like follows: This stream continuously calculates the average value of "price" of the last 10 tuples, but only considers those tuples whose prices are greater than 100.0. In the next step, the declarative query is translated into a logical query plan. A query plan is a directed graph where the nodes are operators and the edges describe the processing flow. Each operator in the query plan encapsulates the semantic of a specific operation, such as filtering or aggregation. In DSMSs that process relational data streams, the operators are equal or similar to the operators of the Relational algebra, so that there are operators for selection, projection, join, and set operations. This operator concept allows the very flexible and versatile processing of a DSMS. === Optimization of queries === The logical query plan can be optimized, which strongly depends on the streaming model. The basic concepts for optimizing continuous queries are equal to those from database systems. If there are relational data streams and the logical query plan is based on relational operators from the Relational algebra, a query optimizer can use the algebraic equivalences to optimize the plan. These may be, for example, to push selection operators down to the sources, because they are not so computationally intensive like join operators. Furthermore, there are also cost-based optimization techniques like in DBMS, where a query plan with the lowest costs is chosen from different equivalent query plans. One example is to choose the order of two successive join operators. In DBMS this decision is mostly done by certain statistics of the involved databases. But, since the data of a data streams is unknown in advance, there are no such statistics in a DSMS. However, it is possible to observe a data stream for a certain time to obtain some statistics. Using these statistics, the query can also be optimized later. So, in contrast to a DBMS, some DSMS allows to optimize the query even during runtime. Therefore, a DSMS needs some plan migration strategies to replace a running query plan with a new one. === Transformation of queries === Since a logical operator is only responsible for the semantics of an operation but does not consist of any algorithms, the logical query plan must be transformed into an executable counterpart. This is called a physical query plan. The distinction between a logical and a physical operator plan allows more than one implementation for the same logical operator. The join, for example, is logically the same, although it can be implemented by different algorithms like a Nested loop join or a Sort-merge join. Notice, these algorithms also strongly depend on the used stream and processing model. Finally, the query is available as a physical query plan. === Execution of queries === Since the physical query plan consists of executable algorithms, it can be directly executed. For this, the physical query plan is installed into the system. The bottom of the graph (of the query plan) is connected to the incoming sources, which can be everything like connectors to sensors. The top of the graph is connected to the outgoing sinks, which may be for example a visualization. Since most DSMSs are data-driven, a query is executed by pushing the incoming data elements from the source through the query plan to the sink. Each time when a data element passes an operator, the operator performs its specific operation on the data element and forwards the result to all successive operators. == Examples == AURORA, StreamBase Systems, Inc. Archived 23 March 2009 at the Wayback Machine Hortonworks DataFlow IBM Streams NIAGARA Query Engine NiagaraST: A Research Data Stream Management System at Portland State University Odysseus, an open source Java-based framework for Data Stream Management Systems Pipeline DB PIPES Archived 24 December 2016 at the Wayback Machine, webMethods Business Events QStream SAS Event Stream Processing SQLstream STREAM StreamGlobe StreamInsight TelegraphCQ WSO2 Stream Processor

    Read more →
  • Biomedical data science

    Biomedical data science

    Biomedical data science is a multidisciplinary field which leverages large volumes of data to promote biomedical innovation and discovery. Biomedical data science draws from various fields including Biostatistics, Biomedical informatics, and machine learning, with the goal of understanding biological and medical data. It can be viewed as the study and application of data science to solve biomedical problems. Modern biomedical datasets often have specific features which make their analyses difficult, including: Large numbers of feature (sometimes billions), typically far larger than the number of samples (typically tens or hundreds) Noisy and missing data Privacy concerns (e.g., electronic health record confidentiality) Requirement of interpretability from decision makers and regulatory bodies Many biomedical data science projects apply machine learning to such datasets. These characteristics, while also present in many data science applications more generally, make biomedical data science a specific field. Examples of biomedical data science research include: Computational genomics Computational imaging Electronic health records data mining Biomedical network science Clinical Natural Language Processing (NLP) == Computational Imaging and Deep Learning == Computational imaging is a cornerstone of biomedical data science, focusing on the development of algorithms to enhance, analyze, and interpret medical imagery. In recent years, the field has been transformed by the integration of deep learning, particularly through the use of Convolutional Neural Networks. Deep learning started from researchers manually defining characteristics like edge detection or texture representation learning. In a more modern approach of computational imaging, models automatically learn a hierarchy of features directly from raw pixel data. This overlap between data science and deep learning is applied across several key tasks: Classification: Identifying the presence of specific diseases, such as distinguishing between benign and malignant tumors in histopathology slides or detecting pneumonia in chest X-rays. Segmentation: The precise delineation of anatomical structures or lesions. A notable example is the U-Net architecture, which is widely used for biomedical image segmentation to help clinicians quantify organ volume or track tumor growth. Detection: Automating the localization of small objects, such as identifying microcalcifications in mammograms or polyps during colonoscopies. Registration: The process of aligning multiple images to provide a comprehensive view of the patient's anatomy. Even with all of these enhancements, the application of deep learning in medical imaging requires accomplishing vigorous challenges. An example of these changes is building large, annotated datasets and creating the imperative for model interpretability in clinical decision-making. == Electronic Health Records == Electronic Health Records (EHRs) are a digital alternative to patient paper charts, usually including individual records or population health information. EHRs can be used in a wide variety of applications, including research and analysation as they often include demographics, diagnoses, medications, test results, and personal statistics. === History === ==== 1960s ==== The earliest precursor is considered Dr. Lawrence Weed's problem-oriented medical record (POMR) published in the 1968 which sorts and groups medical records by medical diagnoses and symptoms. The POMR was the first system to organize based off of patient information rather than the source (doctors, nurses, attendings, etc.). In 1969, the Regenstrief Institute developed and published the Regenstrief Medical Record System which established electronic writing, storage, and retrieval of records which served as the basis for modern EHR systems. ==== 2000s ==== In 2009, the Health Information Technology for Economic and Clinical Health Act (HITECH Act) was passed in the United States. This act standardized privacy and distribution of EHRs and increased the acceptance and utilization of EHRs within medical and academic settings. == Artificial Intelligence and Machine Learning Applications == Machine Learning and Artificial Intelligence have become central tools in biomedical data science. Recent advances in large language models (LLMs) have expanded their role beyond text, with models trained directly on genomic sequences enabling tasks such as gene function prediction, variant effect analysis, and drug discovery. In clinical settings, Natural Language Processing (NLP) models are applied to electronic health records to extract structured insights from unstructured clinical notes and data, supporting diagnosis and treatment planning. Beyond genomics, AI models have been applied to protein structure prediction. AlphaFold, developed by Google DeepMind, uses deep learning to predict three-dimensional protein structures from amino acid sequences with high accuracy. These predictions have been used to support drug target identification and the study of disease mechanisms. == Knowledge Graphs == Knowledge graphs (KGs) are widely used in biomedical data science to represent and analyze complex relationships among biological and medical entities. By structuring data as nodes (e.g., genes, diseases, drugs) and edges (relationships), KGs enable computational methods to extract insights and support decision-making. These biomedical relationships can be efficiently modeled and queried using technologies such as Neo4j. === Biomedical Research Applications === KGs provide biomedical researchers with a way to model complex biological systems. They have been used to identify the relationships between diseases and biomolecules, support drug repurposing, and to uncover new biological insights. Additional applications include: Identification of novel antibiotic resistance genes through graph-based link prediction. Finding associations between miRNA and diseases. Prediction of protein-protein interactions. === Clinical Applications === In clinical settings, KGs can be used to make visual representations of a patient's electronic health records. The data obtained from these graphs can assist healthcare providers in improving patient diagnoses and prescribing more effective drugs. Additionally, embeddings derived from resources like the Unified Medical Language System (UMLS) enable natural language processing of clinical text and similarity analysis between medical concepts. === Limitations === Despite their advantages, knowledge graphs face several challenges. Some of these include: High algorithmic complexity and large biological datasets make the process computationally expensive. KG construction can be a time-consuming process that requires careful attention to assign appropriate node types and vocabularies. Using data from a wide range of datasets in one KG requires them to be effectively integrated. == Privacy == A primary challenge in biomedical data science is maintaining medical privacy. Conducting research requires that data be collected on a number of people for training and testing purposes and is stored within biomedical datasets. This poses a risk for violating patient confidentiality and may dissuade people from participating in studies. The main sources of health statistics are surveys administrative and medical records health care claims data, vital records surveillance disease registries grey literature and peer-reviewed literature. Large data collection is a useful tool for researching various medical conditions. Researchers use these large datasets of information to identify factors that may make people more susceptible to certain diseases. Large amounts of collected data can help researchers identify patterns for disease probabilities. The findings can show a person is more likely for a condition, or identify environmental, social, and personal habits that may lead to adverse health issues. Institutions researching using personal medical information come with a moral and legal responsibility to protect the use of that information. Protection of the collected information has become a big concern. Sophisticated and coordinated attacks on certain medical systems happen more frequently. Medical companies, medical insurance and private businesses have invested a great deal into the protection of personal data. Despite this, data breaches continue to be documented. The chart below shows the top healthcare breaches in 2025. For these reasons, many people have reservations about giving up their personal data. Aside from the legitimate use of personal data there have been instances where companies have found methods to profit from brokering medical information. Concerns exist regarding unauthorized use of sensitive information within these data companies. If a person is identified within a dataset, then sensitive data can be used to discriminate against them. For example, insurance companies may charge a hi

    Read more →
  • Electronic lab notebook

    Electronic lab notebook

    An electronic lab notebook or electronic laboratory notebook (ELN) is a computer program designed to replace paper laboratory notebooks. Lab notebooks in general are used by scientists, engineers, and technicians to document research, experiments, and procedures performed in a laboratory. A lab notebook is often maintained to be a legal document and may be used in a court of law as evidence. Similar to an inventor's notebook, the lab notebook is also often referred to in patent prosecution and intellectual property litigation. Electronic lab notebooks offer many benefits to the user as well as organizations; they are easier to search upon, simplify data copying and backups, and support collaboration amongst many users. ELNs can have fine-grained access controls, and can be more secure than their paper counterparts. They also allow the direct incorporation of data from instruments, replacing the practice of printing out data to be stapled into a paper notebook. == Types == ELNs can be divided into two categories: "Specific ELNs" contain features designed to work with specific applications, scientific instrumentation or data types. "Cross-disciplinary ELNs" or "Generic ELNs" are designed to support access to all data and information that needs to be recorded in a lab notebook. Lab Platforms that combine an ELN, LIMS, and scientific data management together, all-in-one configurable software environment. Solutions range from specialized programs designed from the ground up for use as an ELN, to modifications or direct use of more general programs. Examples of using more general software as an ELN include using OpenWetWare, a MediaWiki install (running the same software that Wikipedia uses), WordPress, or the use of general note taking software such as OneNote as an ELN. ELN's come in many different forms. They can be standalone programs, use a client-server model, or be entirely web-based. Some use a lab-notebook approach, others resemble a blog. ELNs are embracing artificial intelligence and LLM technology to provide scientific AI chat assistants. A good many variations on the "ELN" acronym have appeared. Differences between systems with different names are often subtle, with considerable functional overlap between them. Examples include "ERN" (Electronic Research Notebook), "ERMS" (Electronic Resource (or Research or Records) Management System (or Software) and SDMS (Scientific Data (or Document) Management System (or Software). Ultimately, these types of systems all strive to do the same thing: Capture, record, centralize and protect scientific data in a way that is highly searchable, historically accurate, and legally stringent, and which also promotes secure collaboration, greater efficiency, reduced mistakes and lowered total research costs. == Objectives == A good electronic laboratory notebook should offer a secure environment to protect the integrity of both data and process, whilst also affording the flexibility to adopt new processes or changes to existing processes without recourse to further software development. The package architecture should be a modular design, so as to offer the benefit of minimizing validation costs of any subsequent changes that you may wish to make in the future as your needs change. A good electronic laboratory notebook should be an "out of the box" solution that, as standard, has fully configurable forms to comply with the requirements of regulated analytical groups through to a sophisticated ELN for inclusion of structures, spectra, chromatograms, pictures, text, etc. where a preconfigured form is less appropriate. All data within the system may be stored in a database (e.g. MySQL, MS-SQL, Oracle) and be fully searchable. The system should enable data to be collected, stored and retrieved through any combination of forms or ELN that best meets the requirements of the user. The application should enable secure forms to be generated that accept laboratory data input via PCs and/or laptops / palmtops, and should be directly linked to electronic devices such as laboratory balances, pH meters, etc. Networked or wireless communications should be accommodated for by the package which will allow data to be interrogated, tabulated, checked, approved, stored and archived to comply with the latest regulatory guidance and legislation. A system should also include a scheduling option for routine procedures such as equipment qualification and study related timelines. It should include configurable qualification requirements to automatically verify that instruments have been cleaned and calibrated within a specified time period, that reagents have been quality-checked and have not expired, and that workers are trained and authorized to use the equipment and perform the procedures. == Regulatory and legal aspects == The laboratory accreditation criteria found in the ISO 17025 standard needs to be considered for the protection and computer backup of electronic records. These criteria can be found specifically in clause 4.13.1.4 of the standard. Electronic lab notebooks used for development or research in regulated industries, such as medical devices or pharmaceuticals, are expected to comply with FDA regulations related to software validation. The purpose of the regulations is to ensure the integrity of the entries in terms of time, authorship, and content. Unlike ELNs for patent protection, FDA is not concerned with patent interference proceedings, but is concerned with avoidance of falsification. Typical provisions related to software validation are included in the medical device regulations at 21 CFR 820 (et seq.) and Title 21 CFR Part 11. Essentially, the requirements are that the software has been designed and implemented to be suitable for its intended purposes. Evidence to show that this is the case is often provided by a Software Requirements Specification (SRS) setting forth the intended uses and the needs that the ELN will meet; one or more testing protocols that, when followed, demonstrate that the ELN meets the requirements of the specification and that the requirements are satisfied under worst-case conditions. Security, audit trails, prevention of unauthorized changes without substantial collusion of otherwise independent personnel (i.e., those having no interest in the content of the ELN such as independent quality unit personnel) and similar tests are fundamental. Finally, one or more reports demonstrating the results of the testing in accordance with the predefined protocols are required prior to release of the ELN software for use. If the reports show that the software failed to satisfy any of the SRS requirements, then corrective and preventive action ("CAPA") must be undertaken and documented. Such CAPA may extend to minor software revisions, or changes in architecture or major revisions. CAPA activities need to be documented as well. Aside from the requirements to follow such steps for regulated industry, such an approach is generally a good practice in terms of development and release of any software to assure its quality and fitness for use. There are standards related to software development and testing that can be applied (see ref.).

    Read more →
  • Voice inversion

    Voice inversion

    Voice inversion scrambling is an analog method of obscuring the content of a transmission. It is sometimes used in public service radio, automobile racing, cordless telephones and the Family Radio Service. Without a descrambler, the transmission makes the speaker "sound like Donald Duck". Despite the term, the technique operates on the passband of the information and so can be applied to any information being transmitted. == Forms and details == There are various forms of voice inversion which offer differing levels of security. Overall, voice inversion scrambling offers little true security as software and even hobbyist kits are available from kit makers for scrambling and descrambling. The cadence of the speech is not changed. It is often easy to guess what is happening in the conversation by listening for other audio cues like questions, short responses and other language cadences. In the simplest form of voice inversion, the frequency p {\displaystyle p} of each component is replaced with s − p {\displaystyle s-p} , where s {\displaystyle s} is the frequency of a carrier wave. This can be done by amplitude modulating the speech signal with the carrier, then applying a low-pass filter to select the lower sideband. This will make the low tones of the voice sound like high ones and vice versa. This process also occurs naturally if a radio receiver is tuned to a single sideband transmission but set to decode the wrong sideband. There are more advanced forms of voice inversion which are more complex and require more effort to descramble. One method is to use a random code to choose the carrier frequency and then change this code in real time. This is called Rolling Code voice inversion and one can often hear the "ticks" in the transmission which signal the changing of the inversion point. Another method is split band voice inversion. This is where the band is split and then each band is inverted separately. A rolling code can also be added to this method for variable split band inversion (VSB). Common carrier frequencies are: 2.632 kHz, 2.718 kHz, 2.868 kHz, 3.023 kHz, 3.107 kHz, 3.196 kHz, 3.333 kHz, 3.339 kHz, 3.496 kHz, 3.729 kHz and 4.096 kHz. Voice inversion offers no security at all and software is available to restore the original voice, which is why it is no longer used to protect conversations today. However, voice inversion is still found in low-end Chinese walkie talkies.

    Read more →
  • Data custodian

    Data custodian

    In data governance groups, responsibilities for data management are increasingly divided between the business process owners and information technology (IT) departments. Two functional titles commonly used for these roles are data steward and data custodian. Data Stewards are commonly responsible for data content, context, and associated business rules. Data custodians are responsible for the safe custody, transport, storage of the data and implementation of business rules. Simply put, Data Stewards are responsible for what is stored in a data field, while data custodians are responsible for the technical environment and database structure. Common job titles for data custodians are database administrator (DBA), data modeler, ETL developer and data engineer. == Data custodian responsibilities == A data custodian ensures: Access to the data is authorized and controlled Data stewards are identified for each data set Technical processes sustain data integrity Processes exist for data quality issue resolution in partnership with data stewards Technical controls safeguard data Data added to data sets are consistent with the common data model Versions of master data are maintained along with the history of changes Change management practices are applied in maintenance of the database Data content and changes can be audited

    Read more →
  • Universal portfolio algorithm

    Universal portfolio algorithm

    The universal portfolio algorithm is a portfolio selection algorithm from the field of machine learning and information theory. The algorithm learns adaptively from historical data and maximizes log-optimal growth rate in the long run, per the Kelly criterion. It was introduced by the late Stanford University information theorist Thomas M. Cover. The algorithm rebalances the portfolio at the beginning of each trading period. At the beginning of the first trading period it starts with a naive diversification. In the following trading periods the portfolio composition depends on the historical total return of all possible constant-rebalanced portfolios. The universal portfolio algorithm is the predecessor of the various online portfolio selection methodologies.

    Read more →
  • Ultra (cryptography)

    Ultra (cryptography)

    Ultra was the designation adopted by British military intelligence in June 1941 for wartime signals intelligence obtained by breaking high-level encrypted enemy radio and teleprinter communications at the Government Code and Cypher School (GC&CS) at Bletchley Park. Ultra eventually became the standard designation among the western Allies for all such intelligence. The name arose because the intelligence obtained was considered more important than that designated by the highest British security classification then used (Most Secret) and so was regarded as being Ultra Secret. Several other cryptonyms had been used for such intelligence. The code name "Boniface" was used as a cover name for Ultra. In order to ensure that the successful code-breaking did not become apparent to the Germans, British intelligence created a fictional MI6 master spy, Boniface, who controlled a fictional series of agents throughout Germany. Information obtained through code-breaking was often attributed to the human intelligence from the Boniface network. The U.S. used the codename Magic for its decrypts from Japanese sources, including the "Purple" cipher. Much of the German cipher traffic was encrypted on the Enigma machine. Used properly, the German military Enigma would have been virtually unbreakable; in practice, shortcomings in operation allowed it to be broken. The term "Ultra" has often been used almost synonymously with "Enigma decrypts". However, Ultra also encompassed decrypts of the German Lorenz SZ 40/42 machines that were used by the German High Command, and the Hagelin machine. Many observers, at the time and later, regarded Ultra as immensely valuable to the Allies. Winston Churchill was reported to have told King George VI, when presenting to him Stewart Menzies (head of the Secret Intelligence Service and the person who controlled distribution of Ultra decrypts to the government): "It is thanks to the secret weapon of General Menzies, put into use on all the fronts, that we won the war!" F. W. Winterbotham quoted the western Supreme Allied Commander, Dwight D. Eisenhower, at war's end describing Ultra as having been "decisive" to Allied victory. Sir Harry Hinsley, Bletchley Park veteran and official historian of British Intelligence in World War II, made a similar assessment of Ultra, saying that while the Allies would have won the war without it, "the war would have been something like two years longer, perhaps three years longer, possibly four years longer than it was." However, Hinsley and others have emphasized the difficulties of counterfactual history in attempting such conclusions, and some historians, such as John Keegan, have said the shortening might have been as little as the three months it took the United States to deploy the atomic bomb. == Sources of intelligence == Most Ultra intelligence was derived from reading radio messages that had been encrypted with cipher machines, complemented by material from radio communications using traffic analysis and direction finding. In the early phases of the war, particularly during the eight-month Phoney War, the Germans could transmit most of their messages using land lines and so had no need to use radio. This meant that those at Bletchley Park had some time to build up experience of collecting and starting to decrypt messages on the various radio networks. German Enigma messages were the main source, with those of the German air force (the Luftwaffe) predominating, as they used radio more and their operators were particularly ill-disciplined. === German === ==== Enigma ==== "Enigma" refers to a family of electro-mechanical rotor cipher machines. These produced a polyalphabetic substitution cipher and were widely thought to be unbreakable in the 1920s, when a variant of the commercial Model D was first used by the Reichswehr. The German Army (Heer), Navy, Air Force, Nazi party, Gestapo and German diplomats used Enigma machines in several variants. Abwehr (German military intelligence) used a four-rotor machine without a plugboard and Naval Enigma used different key management from that of the army or air force, making its traffic far more difficult to cryptanalyse; each variant required different cryptanalytic treatment. The commercial versions were not as secure and Dilly Knox of GC&CS is said to have broken one before the war. German military Enigma was first broken in December 1932 by Marian Rejewski and the Polish Cipher Bureau, using a combination of brilliant mathematics, the services of a spy in the German office responsible for administering encrypted communications, and good luck. The Poles read Enigma to the outbreak of World War II and beyond, in France. At the turn of 1939, the Germans made the systems ten times more complex, which required a tenfold increase in Polish decryption equipment, which they could not meet. On 25 July 1939, the Polish Cipher Bureau handed reconstructed Enigma machines and their techniques for decrypting ciphers to the French and British. Gordon Welchman wrote, Ultra would never have got off the ground if we had not learned from the Poles, in the nick of time, the details both of the German military Enigma machine, and of the operating procedures that were in use. At Bletchley Park, some of the key people responsible for success against Enigma included mathematicians Alan Turing and Hugh Alexander and, at the British Tabulating Machine Company, chief engineer Harold Keen. After the war, interrogation of German cryptographic personnel led to the conclusion that German cryptanalysts understood that cryptanalytic attacks against Enigma were possible but were thought to require impracticable amounts of effort and investment. The Poles' early start at breaking Enigma and the continuity of their success gave the Allies an advantage when World War II began. ==== Lorenz cipher ==== In June 1941, the Germans started to introduce on-line stream cipher teleprinter systems for strategic point-to-point radio links, to which the British gave the code-name Fish. Several systems were used, principally the Lorenz SZ 40/42 (codenamed "Tunny" by the British) and Geheimfernschreiber ("Sturgeon"). These cipher systems were cryptanalysed, particularly Tunny, which the British thoroughly penetrated. It was eventually attacked using Colossus machines, which were the first digital programme-controlled electronic computers. In many respects the Tunny work was more difficult than for the Enigma, since the British codebreakers had no knowledge of the machine producing it and no head-start such as that the Poles had given them against Enigma. Although the volume of intelligence derived from this system was much smaller than that from Enigma, its importance was often far higher because it produced primarily high-level, strategic intelligence that was sent between Wehrmacht high command (Oberkommando der Wehrmacht, OKW). The eventual bulk decryption of Lorenz-enciphered messages contributed significantly, and perhaps decisively, to the defeat of Nazi Germany. Nevertheless, the Tunny story has become much less well known among the public than the Enigma one. At Bletchley Park, some of the key people responsible for success in the Tunny effort included mathematicians W. T. "Bill" Tutte and Max Newman and electrical engineer Tommy Flowers. === Italian === In June 1940, the Italians were using book codes for most of their military messages, except for the Italian Navy, which in early 1941 had started using a version of the Hagelin rotor-based cipher machine C-38. This was broken from June 1941 onwards by the Italian subsection of GC&CS at Bletchley Park. === Japanese === In the Pacific theatre, a Japanese cipher machine, called "Purple" by the Americans, was used for highest-level Japanese diplomatic traffic. It produced a polyalphabetic substitution cipher, but unlike Enigma, was not a rotor machine, being built around electrical stepping switches. It was broken by the US Army Signal Intelligence Service and disseminated as Magic. Detailed reports by the Japanese ambassador to Germany were encrypted on the Purple machine. His reports included reviews of German assessments of the military situation, reviews of strategy and intentions, reports on direct inspections by the ambassador (in one case, of Normandy beach defences), and reports of long interviews with Hitler. The Japanese are said to have obtained an Enigma machine in 1937, although it is debated whether they were given it by the Germans or bought a commercial version, which, apart from the plugboard and internal wiring, was the German Heer/Luftwaffe machine. Having developed a similar machine, the Japanese did not use the Enigma machine for their most secret communications. The chief fleet communications code system used by the Imperial Japanese Navy was called JN-25 by the Americans, and by early 1942 the US Navy had made considerable progress in decrypting Japanese naval messages. The US Army also made progress on the

    Read more →
  • Stegomalware

    Stegomalware

    Stegomalware is a form of malicious software that leverages steganography techniques to conceal its code, configuration data, or command-and-control (C&C) communications within seemingly benign digital media such as images, audio files, videos, documents, or network traffic. It typically embeds encrypted or obfuscated payloads into digital media and only extracts and executes them at runtime, which makes traditional signature-based and sandbox-based detection significantly more difficult. Stegomalware has been observed in attacks ranging from advanced persistent threats (APTs) to financially motivated cybercrime, and is now the subject of dedicated academic surveys, research projects, and international law-enforcement initiatives. The key distinction between stegomalware and traditional obfuscated malware lies in the encoding location. After obfuscation, malicious code remains present within the executable and can theoretically be discovered through static analysis. In contrast, stegomalware hides the payload entirely within a cover medium (image, audio, etc.), remaining invisible until the malware dynamically extracts and executes it at runtime. == History == The term stegomalware was formally introduced by researchers Águila, Laskov, and others in the context of mobile malware and presented at the Inscrypt (Information Security and Cryptology) conference in 2014. This marked the first academic formalization of the concept, though earlier work had already identified that botnets and mobile malware could use steganography and covert channels for command-and-control communication over probabilistically unobservable channels. Since its introduction, stegomalware has evolved from a theoretical concern to a documented threat. In 2011, the APT operation known as "Operation Shady RAT" became one of the first documented cases of stegomalware in the wild, using digital images to hide Internet Protocol addresses and command-and-control server addresses. The same year, the Duqu malware (targeting industrial manufacturers) embedded victim data into JPEG image files before exfiltration, making the data transfer virtually undetectable to network-level security tools. From 2014 onwards, stegomalware became more prevalent in organized cybercrime and advanced persistent threat campaigns. Notable examples include Zeus/Zbot, which masked configuration data in images; Gatak/Stegoloader, which hid shellcode in PNG files; TeslaCrypt, which embedded C&C commands in JPEGs; and Cerber, which concealed ransomware payloads within images. By the 2010s, stegomalware had become established as a preferred evasion technique for espionage, financial theft, and ransomware distribution campaigns. Recent surveys (2020–2025) document that stegomalware has increasingly been exploited by adversaries targeting banks, enterprises, government agencies, educational institutions, and internet users via malvertising campaigns. The technique is now considered a sophisticated method of attack worthy of dedicated international law-enforcement attention. == Technical Characteristics and Definitions == Stegomalware operates through a three-component architecture: Stegotext (R): An innocent-looking digital asset (image, audio file, etc.) into which the malicious payload is embedded. Secret key (sk): A key used by the embedding and extraction algorithms, typically hardcoded into the malware. Payload (p): The actual malicious code, configuration data, or C&C commands hidden within the stegotext. The malware extracts the payload at runtime using the secret key and either executes it directly or uses it to download additional stages of the attack. Stegomalware can be classified into several types based on deployment method: Type 0 (Autonomous): Both the stegotext and extraction algorithm are embedded within the malware application itself. The malicious payload is extracted and executed locally without external communication. Type I (Update): The stegotext and secret key are downloaded from a remote server at runtime; only the extraction algorithm is included in the malware. This variant is more flexible, allowing attackers to push updated payloads. Type II (External Algorithm): Neither the stegotext nor the extraction algorithm are distributed with the malware; both are fetched from an attacker-controlled infrastructure, providing maximum flexibility and evasion. == Steganography techniques == === Spatial domain methods === Stegomalware predominantly uses steganographic methods designed for images, as images are the most common cover medium in the wild. The most basic spatial domain technique is Least Significant Bit (LSB) substitution, which replaces the least significant bits of pixel color values with payload bits. While simple and easy to implement, LSB is also relatively easy to detect through statistical analysis. More sophisticated spatial domain techniques include: HUGO (High Undetectable steGO) (2010): Minimizes detectable distortion by distributing the payload across multiple pixels, achieving embedding capacity with reduced statistical footprint. WOW (Wavelet Obtained Weights) (2012): Embeds data preferentially in textured regions of images where modifications are less perceptually noticeable. UNIWARD (Universal Wavelet Relative Distortion) (2014): Uses a universal distortion function applicable to multiple image formats, balancing payload capacity with undetectability. HILL (2014): Applies high-pass and low-pass filters to identify robust embedding regions. MiPOD (Minimizing the Power of Optimal Detector) (2016): Designed to minimize the power of theoretical optimal steganalysis detectors. === Transform domain methods === Transform domain techniques convert images into the frequency domain (e.g., using DCT or DWT) before embedding, allowing for more robust hiding in JPEG and other compressed formats: Embedding in DCT coefficients (used in JPEG compression) Embedding in DWT coefficients (used in lossless formats) Spread spectrum techniques, which distribute the payload across many frequency components Transform domain methods are generally more resistant to noise, compression, and image transformations than spatial methods. === Generative adversarial network (GAN) methods === Recent advances in machine learning have introduced GAN-based steganography, where a generative model produces stego images that minimize detectable artifacts: SGAN (Steganographic GAN) (2017): First GAN applied to steganography, using a generator, discriminator, and steganalysis network. ASDL-GAN (2017): Performs automatic steganographic distortion learning at the pixel level. SteganoGAN (2019): Improves upon earlier GAN models, achieving higher embedding capacity and robustness. HiGAN (Hiding Images GAN) (2020): Enables hiding one image within another while maintaining visual plausibility. GAN-based approaches are more resilient to standard steganalysis attacks but remain an emerging threat requiring further research. == Notable malware campaigns == Stegomalware has been documented in numerous high-profile cyber attacks and campaigns. Notable examples include: Operation Shady RAT (2011): Used digital images to hide command-and-control server addresses in targeted espionage. Duqu (2011): Embedded victim data into JPEG files to exfiltrate industrial control system information. Zeus/Zbot (2014): Masked banking configuration data inside JPEG files exploited via malvertising. Gatak/Stegoloader (2015): Hid shellcode in PNG files for software licensing attacks and bot command execution. TeslaCrypt (2015): Embedded C&C commands and ransomware keys in JPEG images. Cerber (2016): Concealed executable ransomware code in JPEG files distributed via phishing. DNSChanger (2016): Embedded malicious code in PNG files for DNS hijacking campaigns. Sundown Exploit Kit (2017): Distributed exploit code in PNG files via malvertising. AdGholas (2017): Used JPEG steganography to distribute ransomware via malvertising. Synccrypt (2017): Hidden ransomware components in JPEG-steganographic encrypted archives. ZeroT/PlugX (2017): Hid Remote Access Trojan payloads in BMP files for espionage. Loki Bot (2018): Concealed malware installers in JPEG and video files. Waterbug (APT28) (2019): Injected malicious DLLs into WAV audio files. Shlayer (macOS adware) (2019): Hid malicious URLs in JPEG files via malvertising. === Attack vectors === The most common attack vectors for stegomalware include: Phishing emails with malicious attachments or links Malvertising campaigns using malicious banner advertisements Exploit kits through compromised or malicious websites Legitimate application vulnerabilities (e.g., watering-hole attacks) Fake software distribution (cracked software, keygen tools) === Exploitation stages === Stegomalware typically serves one or more roles in attack lifecycles: Payload delivery: Stego images contain full executable code or shellcode. C&C communication: Hidden data contains server addresses or command instructio

    Read more →