AI App Notebook Lm

AI App Notebook Lm — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Image translation

    Image translation

    Image translation is the machine translation of images of printed text (posters, banners, menus, screenshots etc.). This is done by applying optical character recognition (OCR) technology to an image to extract any text contained in the image, and then have this text translated into a language of their choice, and the applying digital image processing on the original image to get the translated image with a new language. == General == Machine translation made available on the internet (web and mobile) is a notable advance in multilingual communication eliminating the need for an intermediary translator/interpreter, translating foreign texts still poses a problem to the user as they cannot be expected to be able to type the foreign text they wish to translate and understand. Manually entering the foreign text may prove to be a difficulty especially in cases where an unfamiliar alphabet is used from a script which user can't read, e.g. Cyrillic, Chinese, Japanese etc. for an English speaker or any speaker of a Latin-based language or vice versa. The technical advancements in OCR made it possible to recognize text from images. The possibility to use one's mobile device's camera to capture and extract printed text is also known as mobile OCR and was first introduced in Japanese manufactured mobile telephones in 2004. Using the handheld's camera one could take a picture of (a line of) text and have it extracted (digitalized) for further manipulation such as storing the information in their contacts list, as a web page address (URL) or text to use in an SMS/email message etc. Presently, mobile devices having a camera resolution of 2 megapixels or above with an auto-focus ability, often feature the text scanner service. Taking the text scanning facility one step further, image translation emerged, giving users the ability to capture text with their mobile phone's camera, extract the text, and have it translated in their own language. More and more applications emerged on this technology including Word Lens. After getting acquired by Google, it was made a part of Google Translate mobile app. Another simultaneous advancement in Image Processing, has also made it possible now to replace the text on the image with the translated text and create a new image altogether. == History == The development of the image translation service springs from the advances in OCR technology (miniaturization and reduction of memory resources consumed) enabling text scanning on mobile telephones. Among the first to announce mobile software capable of “reading” text using the mobile device's camera is International Wireless Inc. who in February 2003 released their “CheckPoint” and “WebPoint” applications. “CheckPoint” reads critical symbolic information on checks and is aimed at reducing losses that mobile merchants suffer from “bounced” checks by scanning the MICR number on the bottom of a check, while “WebPoint” enables the visual recognition and decoding of printed URL's, which are then opened by the device's web browser. The first commercial release of a mobile text scanner, however, took place in December 2004 when Vodafone and Sharp began selling the 902SH mobile which was the first to feature a 2 megapixel digital camera with optical zoom. Among the device's various multimedia features was the built-in text/bar code/QR code scanner. The text scanner function could handle up to 60 alphabetical characters simultaneously. The scanned text could be then sent as an email or SMS message, added as a dictionary entry or, in the case of scanned URLs, opened via the device's web browser. All subsequent Sharp mobiles feature the text scanner functionality. In September 2005, NEC Corporation and the Nara Institute of Science and Technology in Japan (NAIST) announced new software capable of transforming cameraphones into text scanners. The application differs substantially from similarly equipped mobile telephones in Japan (able to scan businesscards and small bits of text and use OCR to convert that to editable text or to URL addresses) by it ability to scan a whole page. The two companies, however, said they would not release the software commercially before the end of 2008. Combining the text scanner function with machine translation technology was first made by US company RantNetwork who in July 2007 started selling the Communilator, a machine translation application for mobile devices featuring the Image Translation functionality. Using the built-in camera, the mobile user could take a picture of some printed text, apply OCR to recognize the text and then translate it into any one of over 25 language available. In April 2008 Nokia showcased their Shoot-to-Translate application for the N73 model which is capable of taking a picture using the device's camera, extracting the text and then translating it. The application only offers Chinese to English translation, and does not handle large segments of text. Nokia said they are in the process of developing their Multiscanner product which, besides scanning text and business cards, would be able to translate between 52 languages. Again in April 2008, Korean company Unichal Inc. released their handheld Dixau text scanner capable of scanning and recognizing English text and then translating it into Korean using online translation tools such as Wikipedia or Google Translate. The device is connected to a PC or a laptop via the USB port. In February 2009, Bulgarian company Interlecta presented at the Mobile World Congress in Barcelona their mobile translator including image recognition and speech synthesis. The application handles all European languages along with Chinese, Japanese and Korean. The software connects to a server over the Internet to accomplish the image recognition and the translation. In May 2014, Google acquired Word Lens to improve the quality of visual and voice translation. It is able to scan text or picture with one's device and have it translated instantly. Since the OCR has been improving many companies or website started combining OCR and translation, to read the text from an image and show the translated text. In August 2018, an Indian company created ImageTranslate. It is able to read, translate and re-create the image in another language. As of late 2018, the tool added 13 new languages, including Arabic, Thai, Vietnamese, Hindi, and Bengali, significantly increasing its utility in Asia and the Middle East. This helps users translate photos already stored in their phone's gallery, not just live, real-time views. Currently, image translation is offered by the following companies: Google Translate app with camera ImageTranslate Yandex

    Read more →
  • Multistage interconnection networks

    Multistage interconnection networks

    Multistage interconnection networks (MINs) are a class of high-speed computer networks usually composed of processing elements (PEs) on one end of the network and memory elements (MEs) on the other end, connected by switching elements (SEs). The switching elements themselves are usually connected to each other in stages, hence the name. MINs are typically used in high-performance or parallel computing as a low-latency interconnection (as opposed to traditional packet switching networks), though they could be implemented on top of a packet switching network. Though the network is typically used for routing purposes, it could also be used as a co-processor to the actual processors for such uses as sorting; cyclic shifting, as in a perfect shuffle network; and bitonic sorting. == Background == Interconnection network are used to connect nodes, where nodes can be a single processor or group of processors, to other nodes. Interconnection networks can be categorized on the basis of their topology. Topology is the pattern in which one node is connected to other nodes. There are two main types of topology: static and dynamic. Static interconnect networks are hard-wired and cannot change their configurations. A regular static interconnect is mainly used in small networks made up of loosely couple nodes. The regular structure signifies that the nodes are arranged in specific shape and the shape is maintained throughout the networks. Some examples of static regular interconnections are: Completely connected network In a mesh network, multiple nodes are connected with each other. Each node in the network is connected to every other node in the network. This arrangement allows proper communication of the data between the nodes. But, there are a lot of communication overheads due to the increased number of node connections. Shared busThis network topology involves connection of the nodes with each other over a bus. Every node communicates with every other node using the bus. The bus utility ensures that no data is sent to the wrong node. But, the bus traffic is an important parameter which can affect the system. RingThis is one of the simplest ways of connecting nodes with each other. The nodes are connected with each other to form a ring. For a node to communicate with some other node, it has to send the messages to its neighbor. Therefore, the data message passes through a series of other nodes before reaching the destination. This involves increased latency in the system. TreeThis topology involves connection of the nodes to form a tree. The nodes are connected to form clusters and the clusters are in-turn connected to form the tree. This methodology causes increased complexity in the network. Hypercube This topology consists of connections of the nodes to form cubes. The nodes are also connected to the nodes on the other cubes. ButterflyThis is one of the most complex connections of the nodes. As the figure suggests, there are nodes which are connected and arranged in terms of their ranks. They are arranged in the form of a matrix. In dynamic interconnect networks, the nodes are interconnected via an array of simple switching elements. This interconnection can then be changed by use of routing algorithms, such that the path from one node to other nodes can be varied. Dynamic interconnections can be classified as: Single stage Interconnect Network Multistage interconnect Network Crossbar switch connections == Crossbar Switch Connections == In crossbar switch, there is a dedicated path from one processor to other processors. Thus, if there are n inputs and m outputs, we will need nm switches to realize a crossbar. As the number of outputs increases, the number of switches increases by factor of n. For large network this will be a problem. An alternative to this scheme is staged switching. == Single Stage Interconnect Network == In a single stage interconnect network, the input nodes are connected to output via a single stage of switches. The figure shows 88 single stage switch using shuffle exchange. As one can see, from a single shuffle, not all input can reach all output. Multiple shuffles are required for all inputs to be connected to all the outputs. == Multistage Interconnect Network == A multistage interconnect network is formed by cascading multiple single stage switches. The switches can then use their own routing algorithm, or be controlled by a centralized router, to form a completely interconnected network. Multistage Interconnect Network can be classified into three types: Non-blocking: A non-blocking network can connect any idle input to any idle output, regardless of the connections already established across the network. Crossbar is an example of this type of network. Rearrangeable non-blocking: This type of network can establish all possible connections between inputs and outputs by rearranging its existing connections. Blocking: This type of network cannot realize all possible connections between inputs and outputs. This is because a connection between one free input to another free output is blocked by an existing connection in the network. The number of switching elements required to realize a non-blocking network in highest, followed by rearrangeable non-blocking. Blocking network uses least switching elements. == Examples == Multiple types of multistage interconnection networks exist. === Omega network === An Omega network consists of multiple stages of 22 switching elements. Each input has a dedicated connection to an output. An NN omega network has log2(N) stages and N/2 switching elements in each stage for a perfect shuffle between stages. Thus the network has complexity of 0(N log(N)). Each switching element can employ its own switching algorithm. Consider an 88 omega network. There are 8! = 40320 1-to-1 mappings from input to output. There are 12 switching element for a total permutation of 2^12 = 4096. Thus, it is a blocking network. === Clos network === A Clos network uses 3 stages to switch from N inputs to N outputs. In the first stage, there are r= N/n crossbar switches and each switch is of size nm. In the second stage there are m switches of size rr and finally the last stage is a mirror of the first stage with r switches of size mn. A clos network will be completely non-blocking if m >= 2n-1. The number of connections, though more than omega network is much less than that of a crossbar network. === Beneš network === A Beneš network is a rearrangeably non-blocking network derived from the clos network by initializing n = m = 2. There are (2log2(N) - 1) stages, with each stage containing N/2 22 crossbar switches. An 88 Beneš network has 5 stages of switching elements, and each stage has 4 switching elements. The center three stages has two 44 benes network. The 44 Beneš network, can connect any input to any output recursively.

    Read more →
  • Data marketplace

    Data marketplace

    Data marketplace is an online platform for sharing and consuming data in the form of data assets or data products. Part of the data management stack, it aims to bring together data producers and data consumers (including business users and AI) in a single space, with the objective of increasing access to understandable, high-quality data. Included within its Data Marketplaces and Exchange (DME) category by Gartner, data marketplaces can provide data internally within an organization, externally with partners, or as open data. == Concept == Digitization has dramatically increased data volumes within organizations, with IDC predicting that by 2025 the world will contain 175 zettabytes of data. This has created a need to both manage this data and provide access to it to enable business intelligence and data analysis. However, data is often scattered within multiple systems (such as data warehouses and data lakes), and is in formats that are only understandable by technical experts, such as data scientists. According to IDC, 81% of IT leaders cite data silos as a major barrier to digital transformation. This means that data is not freely available to business users or external audiences such as partners or citizens, limiting its value, and holding back AI deployments. Data marketplaces solve this issue, providing seamless, self-service access to high-quality data in an understandable, secure and auditable manner. They break down data silos, reduce friction in data access, and enable a broader range of users, including non-technical profiles, to find, understand, and consume data autonomously. Data assets on the marketplace can be raw data, data visualizations or data products. Data marketplaces combine data management functions such as data governance with the user-friendly experience offered by e-commerce marketplaces in order to increase the usage of data. These include features such as powerful search engines, feedback, ratings, subscriptions and product description sheets. According to Gartner, data marketplaces provide infrastructure, transactional capabilities, and services for both consumers and providers of data assets. == History and timeline == Data marketplaces have evolved since they first emerged in terms of both their scope and usage. === 2000s === With the rise of the internet, data brokers began collecting, aggregating, distributing and selling personal, financial and marketing data to third parties online. Data marketplaces were deployed to monetize this data, making it discoverable and accessible to users, either through subscriptions or one-off purchases. At the same time, regulations, such as the US Open Government Initiative of 2009 and others around the world mandated greater transparency and data sharing with the public. Data sharing portals were created by public and government bodies to make this information available through self-service to all users. === 2010s === Due to the growth of big data and cloud platforms, cloud-based data exchange platforms emerged. These were offered by major infrastructure providers, and included Amazon Web Services (AWS) Data Exchange, Snowflake Data Marketplace, and the Google Cloud Platform. These platforms moved beyond simple data brokerage or open data by providing structured, catalogued data sharing between organizations. === 2020s === Driven by a need to increase internal data sharing with both business users and AI, organizations are now looking to adopt internal data marketplaces. These aim to democratize data consumption by providing seamless access for all employees and AI to trusted data, including data products, through an intuitive, e-commerce style experience. According to Gartner analyst Richa Jha, "by providing a single, governed platform for discovering, sharing, and scaling data products, data marketplaces drive productivity, collaboration, and ROI across the enterprise." == Data marketplaces within the overall data architecture == Data marketplaces provide a consumption and collaboration layer for data. That means they complement and integrate with other parts of the overall data architecture, including: === Data warehouses and data lakes === Data marketplaces connect to data sources, such as data warehouses or data lakes, to provide intuitive access to the data stored within them, enabling data to be shared and distributed to non-technical audiences. Access can be direct, with data and data products stored within the data marketplace or virtualized. === Data catalog === A data catalog provides a technical inventory of an organization's data estate. It collects technical information on all available data assets within an organization, based on metadata descriptions. This ensures traceability, and supports compliance and governance requirements. Unlike a data marketplace, a data catalog does not provide access to data, and is designed to be used by data professionals, rather than the business. This means it lacks an intuitive, understandable interface and is consequently not easily accessible by business users. === Data mesh === Data mesh is an architecture and framework for data management, first defined by Zhamak Dehghani in 2019. It aims to decentralize data ownership to delegate responsibility, empowering teams and focusing on delivering data to users in the form of self-service data products. The data marketplace is a central pillar of data mesh, providing intuitive access to these data products, and creating a collaboration space for data owners and data consumers. === Data product === Data products are high-value, consumable data assets that package high-quality data and associated tools to enable seamless usage by business users at scale. First defined by McKinsey in 2022, they have an identified owner, a service level agreement (SLA), and a reusability logic. == Core components of a data marketplace == A data marketplace typically includes specific core components: === E-commerce style interface === An e-commerce style experience that engages non-technical users, minimizes the need for training and builds confidence and trust in data. Look and feel should be customizable to incorporate corporate design guidelines to ensure consistency with other organizational applications. === Built-in data catalog === As in a standalone data catalog, this indexes all available data, based on metadata that includes type, source, owner, freshness, and quality level. === Discovery and search engine === This enables users to search, filter, explore and discover available data intuitively. As in an e-commerce marketplace, it should be intelligent, and provide relevant results based on natural language queries. === Access control and security management === Data marketplaces will contain data that needs to be protected under regulations such as the General Data Protection Regulation (GDPR) in Europe, the California Consumer Privacy Act (CCPA) in the United States, and sector-specific frameworks in industries such as finance and healthcare. To ensure both security and compliance while maximizing data consumption, the data marketplace should include granular access management and a full audit trail. === Semantic layer and business glossary === Different parts of the business are likely to use different terms to describe data. This leads to inconsistencies and an inability to share data across systems and teams. The semantic layer and business glossary standardize a shared vocabulary and common definitions of business indicators and concepts, providing a single language for data across the business and for AI agents. === Data governance mechanisms === These enforce corporate data governance policies, ensuring data traceability through data lineage, quality certification, usage monitoring, and continuous improvement through user feedback loops. === Collaboration features === As on an e-commerce website, a data marketplace should provide collaboration features that bring together data users and data owners. This includes the ability to rate data products, share use cases, and provide feedback to data owners, creating a community around data and supporting a data-driven culture. == Types of data marketplace == While they share the same underlying technology, data marketplaces can be deployed in three broad ways: === Internal data marketplaces === These bring together data from across an organization and make it available via self-service to employees from across the business. They aim to widen access to data and consequently to improve decision-making and reporting, increase performance and maximize efficiency. === Ecosystem data marketplaces === These extend sharing beyond a single organization, enabling multiple partners (public institutions, industry players, research bodies) to share and consume data within a governed framework. Data can be provided by all parties or simply by one organization and consumed by others. Ecosystem data marketplaces are particularly relevant in

    Read more →
  • Online Safety Amendment (Social Media Minimum Age) Act 2024

    Online Safety Amendment (Social Media Minimum Age) Act 2024

    The Online Safety Amendment (Social Media Minimum Age) Act 2024 is an Australian act of parliament that prohibits minors under the age of 16 from holding an account on certain social media platforms. It is an amendment to the Online Safety Act 2021 and was passed by the Parliament of Australia on 29 November 2024. It imposes monetary penalties on social media companies that fail to take reasonable steps to prevent minors under 16 that are located in Australia from having accounts on their services. The legislation allows the government to determine which social media platforms must ban age‑restricted users and proclaim a date for the commencement of the ban, with those provisions taking effect on 10 December 2025. Facebook, Instagram, Reddit, Snapchat, TikTok, Twitter, Threads, Twitch, Kick, and YouTube were age‑restricted on 10 December 2025, with the possibility that more platforms may be added. The act is being challenged in the High Court by the Digital Freedom Project. == Background == The ban on access to social media by young people by the federal government originated in November 2023, when shadow communications minister David Coleman introduced a private member's bill requiring the government to conduct a trial for age-verification technology on pornography and social media platforms. While the bill did not succeed, the Albanese government funded the trial in the 2024 Australian federal budget. In June 2024, opposition leader Peter Dutton pledged that a Coalition government would implement a ban on social media for under-16s within 100 days of taking office. The following month, prime minister Anthony Albanese announced the government would introduce legislation banning under-16s from social media. The Online Safety Amendment (Social Media Minimum Age) Bill 2024 was introduced into parliament by minister for communications Michelle Rowland on 21 November 2024, passing both houses on 28 November 2024. The ban on access to social media by young people by the federal government also gained momentum following an entreaty by the wife of the premier of South Australia, Peter Malinauskas, to her husband. She requested that he read The Anxious Generation by Jonathan Haidt and take action to address the impact of social media on the mental health of children. The couple have four young children, and, thinking of them, the premier thought that government should play a part in helping parents to regulate use of social media by their children at home. Malinauskas contacted former High Court chief justice Robert French, who agreed to look at the issue, and in September 2024 handed the premier a 267 page proposal, which he dubbed a "Swiss Army knife" rather than a machete, to adjust to social media's "changing landscape and its complexity". The leaders of other states and territories gave their support to Malinauskas's idea, and he took the French report to National Cabinet to collaborate with chief ministers, premiers, and the prime minister. Community support swelled after stories of parents who had lost their children to suicide after being bullied on social media were published. Albanese himself was moved by a personal letter received from Kelly O'Brien, whose 12-year-old daughter Charlotte had taken her own life due to bullying at school. An event took place at the sidelines of the United Nations General Assembly session in September 2025 at which a mother spoke of her daughter's suicide as "death by bullying ... enabled by social media". The speech won support from world leaders in Greece, Fiji, Tonga and the president of the European Commission Ursula von der Leyen. In early September 2024, South Australia proposed legislation similar to the federal law now in place. The state-based version was intended to ban users under the age of 14, unlike the federal law, which bans those under 16. The state-based law also proposed to require parental consent for 14 and 15‑year‑olds. Later in September, prime minister Anthony Albanese announced that his government intended to introduce legislation to set a minimum age requirement for social media. In November 2024, the federal government indicated their intention to engage the Age Check Certification Scheme following a tender process for an age assurance technology trial. The Albanese government's proposed ban was supported by the governments of every state and territory. Albanese described social media as a "scourge", and said "I want people to spend more time on the footy field or the netball court than they're spending on their phones", that family members are "worried sick about the safety of our kids online", and that social media "is having a negative impact on young people's mental health and on anxiety". Albanese's statements followed an earlier pledge by Liberal opposition leader Peter Dutton who was pushed by the early advocacy of shadow communications minister David Coleman to implement a ban on social media for under 16s within 100 days of being elected. The opposition organised an open letter signed by 140 experts who specialise in child welfare and technology. The opposition was concerned about the invasion of privacy that will occur with the introduction of identification-based age checks. An advocacy group for digital companies in Australia called the plans a "20th Century response to 21st Century challenges". A director of a mental health service voiced concerns, stating that "73% of young people across Australia who accessed mental health support did so through social media". == Implementation == Social media companies will receive a transition period of one year after the legislation is enacted to introduce reasonable controls preventing minors under the age of 16 from holding accounts on their services while physically located in Australia. Enforcement will involve fines of up to A$49.5 million for companies failing to take such steps, with no consequences for parents and children who violate the restrictions. There are no parental consent exceptions to the ban, and while the use of virtual private networks (VPNs) to access these services remains legal in Australia, the services are expected to try to stop under 16s from using VPNs to pretend to be outside Australia. The expectation is to make best-efforts to implement the ban on platforms including Facebook, Instagram, Reddit, Snapchat, TikTok, Twitter, Threads, Twitch, Kick and YouTube. Some social media companies are now obligated to become good enough at profiling Australian children under 16 to satisfy the Australian government they tried to implement the ban to avoid being fined. Consequently, social media companies said they will try to identify restricted users using various methods including behavioural inferencing. On 5 November 2025, it was announced that online gaming platform Roblox will not be banned, but Reddit and live-streaming platform Kick will be added to the list of platforms to be banned. A report by Age Check Certification Scheme, a UK company recruited by the government to consult on the technology used to implement the restrictions, was issued in June 2025, ahead of the December deadline to implement the ban. In June 2025, the preliminary report was released, which stated that "there are no significant technological barriers" to implementing the ban. In late July 2025, Google warned that it would sue the Australian government if YouTube was included in the ban. On 30 July, the government announced that it would extend its social media age limit to include YouTube, following advice from Grant. On 30 July 2025, the minister for communications, Anika Wells, published the Online Safety (Age-Restricted Social Media Platforms) Rules 2025, which specify exactly which types of social media platforms will be banned for certain users. On 31 August 2025, the full report was released, which stated that it would technically be possible to implement the ban; however, coordination among different services is required to successfully implement it. It also highlighted the benefits and flaws of different methods of age verification. On 16 September 2025, it was announced that the eSafety Commissioner will be able to take legal action against social media companies that have not pursued reasonable steps to bar users under the age of 16, and that fines can range up to A$49.5 million against these companies in court. On 19 November 2025, Meta announced that from 4 December their platforms (Instagram, Facebook, and Threads) would be removing users under the age of 16 ahead of the 10 December deadline. Users will be able to scan a face or provide an identity document to prove their age. On 21 November 2025, the eSafety Commissioner announced that the live-streaming platform Twitch will be included in the ban, but that Pinterest would not be. In December 2025, eSafety Commissioner Julie Inman Grant suggested efforts to block users include use by social media companies of various "signals" to identify children that are

    Read more →
  • Vulnerabilities Equities Process

    Vulnerabilities Equities Process

    The Vulnerabilities Equities Process (VEP) is a process used by the U.S. federal government to determine on a case-by-case basis how it should treat zero-day computer security vulnerabilities: whether to disclose them to the public to help improve general computer security, or to keep them secret for offensive use against the government's adversaries. The VEP was first developed during the period 2008–2009, but only became public in 2016, when the government released a redacted version of the VEP in response to a FOIA request by the Electronic Frontier Foundation. Following public pressure for greater transparency in the wake of the Shadow Brokers affair, the U.S. government made a more public disclosure of the VEP process in November 2017. == Participants == According to the VEP plan published in 2017, the Equities Review Board (ERB) is the primary forum for interagency deliberation and determinations concerning the VEP. The ERB meets monthly, but may also be convened sooner if an immediate need arises. The ERB consists of representatives from the following agencies: Office of Management and Budget Office of the Director of National Intelligence (including the Intelligence Community-Security Coordination Center) United States Department of the Treasury United States Department of State United States Department of Justice (including the Federal Bureau of Investigation and the National Cyber Investigative Joint Task Force) Department of Homeland Security (including the National Cybersecurity and Communications Integration Center and the United States Secret Service) United States Department of Energy United States Department of Defense (to include the National Security Agency, including Information Assurance and Signals Intelligence elements), United States Cyber Command, and DoD Cyber Crime Center) United States Department of Commerce Central Intelligence Agency The National Security Agency serves as the executive secretariat for the VEP. == Process == According to the November 2017 version of the VEP, the process is as follows: === Submission and notification === When an agency finds a vulnerability, it will notify the VEP secretariat as soon as is possible. The notification will include a description of the vulnerability and the vulnerable products or systems, together with the agency's recommendation to either disseminate or restrict the vulnerability information. The secretariat will then notify all participants of the submission within one business day, requesting them to respond if they have an relevant interest. === Equity and discussions === An agency expressing an interest must indicate whether it concurs with the original recommendation to disseminate or restrict within five business days. If it does not, it will hold discussions with the submitting agency and the VEP secretariat within seven business days to attempt to reach consensus. If no consensus is reached, the participants will suggest options for the Equities Review Board. === Determination to disseminate or restrict === Decisions whether to disclose or restrict a vulnerability should be made quickly, in full consultation with all concerned agencies, and in the overall best interest of the competing interests of the missions of the U.S. government. As far as possible, determinations should be based on rational, objective methodologies, taking into account factors such as prevalence, reliance, and severity. If the review board members cannot reach consensus, they will vote on a preliminary determination. If an agency with an equity disputes that decision, they may, by providing notice to the VEP secretariat, elect to contest the preliminary determination. If no agency contests a preliminary determination, it will be treated as a final decision. === Handling and follow-on actions === If vulnerability information is released, this will be done as quickly as possible, preferably within seven business days. Disclosure of vulnerabilities will be conducted according to guidelines agreed on by all members. The submitting agency is presumed to be most knowledgeable about the vulnerability and, as such, will be responsible for disseminating vulnerability information to the vendor. The submitting agency may elect to delegate dissemination responsibility to another agency on its behalf. The releasing agency will promptly provide a copy of the disclosed information to the VEP secretariat for record keeping. Additionally, the releasing agency is expected to follow up so the ERB can determine whether the vendor's action meets government requirements. If the vendor chooses not to address a vulnerability, or is not acting with urgency consistent with the risk of the vulnerability, the releasing agency will notify the secretariat, and the government may take other mitigation steps. == Criticism == The VEP process has been criticized for a number of deficiencies, including restriction by non-disclosure agreements, lack of risk ratings, special treatment for the NSA, and less than whole-hearted commitment to disclosure as the default option. == UK equivalent == British intelligence agencies—GCHQ in particular—follow a similar approach, also known as the Equities Process, to determine whether to disclose or retain security vulnerabilities. The Investigatory Powers Act 2016 was amended in 2022 to bring oversight of the operation of the process within the remit of the Investigatory Powers Commissioner. Details of the process were made public in 2018.

    Read more →
  • Merit Network

    Merit Network

    Merit Network, Inc., is a nonprofit member-governed organization providing high-performance computer networking and related services to educational, government, health care, and nonprofit organizations, primarily in Michigan. Created in 1966, Merit operates the longest running regional computer network in the United States. == Organization == Created in 1966 as the Michigan Educational Research Information Triad by Michigan State University (MSU), the University of Michigan (U-M), and Wayne State University (WSU), Merit was created to investigate resource sharing by connecting the mainframe computers at these three Michigan public research universities. Merit's initial three node packet-switched computer network was operational in October 1972 using custom hardware based on DEC PDP-11 minicomputers and software developed by the Merit staff and the staffs at the three universities. Over the next dozen years the initial network grew as new services such as dial-in terminal support, remote job submission, remote printing, and file transfer were added; as gateways to the national and international Tymnet, Telenet, and Datapac networks were established, as support for the X.25 and TCP/IP protocols was added; as additional computers such as WSU's MVS system and the UM's electrical engineering's VAX running UNIX were attached; and as new universities became Merit members. Merit's involvement in national networking activities started in the mid-1980s with connections to the national supercomputing centers and work on the 56 kbit/s National Science Foundation Network (NSFNET), the forerunner of today's Internet. From 1987 until April 1995, Merit re-engineered and managed the NSFNET backbone service. MichNet, Merit's regional network in Michigan was attached to NSFNET and in the early 1990s Merit began extending "the Internet" throughout Michigan, offering both direct connect and dial-in services, and upgrading the statewide network from 56 kbit/s to 1.5 Mbit/s, and on to 45, 155, 622 Mbit/s, and eventually 1 and 10 Gbit/s. In 2003 Merit began its transition to a facilities based network, using fiber optic facilities that it shares with its members, that it purchases or leases under long-term agreements, or that it builds. In addition to network connectivity services, Merit offers a number of related services within Michigan and beyond, including: Internet2 connectivity, VPN, Network monitoring, Voice over IP (VOIP), Cloud storage, E-mail, Domain Name, Network Time, VMware and Zimbra software licensing, Colocation, and professional development seminars, workshops, classes, conferences, and meetings. == History == === Creating the network: 1966 to 1973 === The Michigan Educational Research Information Triad (MERIT) was formed in the fall of 1966 by Michigan State University (MSU), University of Michigan (U-M), and Wayne State University (WSU). More often known as the Merit Computer Network or simply Merit, it was created to design and implement a computer network connecting the mainframe computers at the universities. In the fall of 1969, after funding for the initial development of the network had been secured, Bertram Herzog was named director for MERIT. Eric Aupperle was hired as senior engineer, and was charged with finding hardware to make the network operational. The National Science Foundation (NSF) and the State of Michigan provided the initial funding for the network. In June 1970, the Applied Dynamics Division of Reliance Electric in Saline, Michigan was contracted to build three Communication Computers or CCs. Each would consist of a Digital Equipment Corporation (DEC) PDP-11 computer, dataphone interfaces, and interfaces that would attach them directly to the mainframe computers. The cost was to be slightly less than the $300,000 ($2,487,100, adjusted for inflation) originally budgeted. Merit staff wrote the software that ran on the CCs, while staff at each of the universities wrote the mainframe software to interface to the CCs. The first completed connection linked the IBM S/360-67 mainframe computers running the Michigan Terminal System at WSU and U-M, and was publicly demonstrated on December 14, 1971. The MSU node was completed in October 1972, adding a CDC 6500 mainframe running Scope/Hustler. The network was officially dedicated on May 15, 1973. === Expanding the network: 1974 to 1985 === In 1974, Herzog returned to teaching in the University of Michigan's Industrial Engineering Department, and Aupperle was appointed as director. Use of the all uppercase name "MERIT" was abandoned in favor of the mixed case "Merit". The first network connections were host to host interactive connections which allowed person to remote computer or local computer to remote computer interactions. To this, terminal to host connections, batch connections (remote job submission, remote printing, batch file transfer), and interactive file copy were added. And, in addition to connecting to host computers over custom hardware interfaces, the ability to connect to hosts or other networks over groups of asynchronous ports and via X.25 were added. Merit interconnected with Telenet (later SprintNet) in 1976 to give Merit users dial-in access from locations around the United States. Dial-in access within the U.S. and internationally was further expanded via Merit's interconnections to Tymnet, ADP's Autonet, and later still the IBM Global Network as well as Merit's own expanding network of dial-in sites in Michigan, New York City, and Washington, D.C. In 1978, Western Michigan University (WMU) became the fourth member of Merit (prompting a name change, as the acronym Merit no longer made sense as the group was no longer a triad). To expand the network, the Merit staff developed new hardware interfaces for the Digital PDP-11 based on printed circuit technology. The new system became known as the Primary Communications Processor (PCP), with the earliest PCPs connecting a PDP-10 located at WMU and a DEC VAX running UNIX at U-M's Electrical Engineering department. A second hardware technology initiative in 1983 produced the smaller Secondary Communication Processors (SCP) based on DEC LSI-11 processors. The first SCP was installed at the Michigan Union in Ann Arbor, creating UMnet, which extended Merit's network connectivity deeply into the U-M campus. In 1983 Merit's PCP and SCP software was enhanced to support TCP/IP and Merit interconnected with the ARPANET. === National networking, NSFNET, and the Internet: 1986 to 1995 === In 1986 Merit engineered and operated leased lines and satellite links that allowed the University of Michigan to access the supercomputing facilities at Pittsburgh, San Diego, and NCAR. In 1987, Merit, IBM and MCI submitted a winning proposal to NSF to implement a new NSFNET backbone network. The new NSFNET backbone network service began July 1, 1988. It interconnected supercomputing centers around the country at 1.5 megabits per second (T1), 24 times faster than the 56 kilobits-per-second speed of the previous network. The NSFNET backbone grew to link scientists and educators on university campuses nationwide and connect them to their counterparts around the world. The NSFNET project caused substantial growth at Merit, nearly tripling the staff and leading to the establishment of a new 24-hour Network Operations Center at the U-M Computer Center. In September 1990 in anticipation of the NSFNET T3 upgrade and the approaching end of the 5-year NSFNET cooperative agreement, Merit, IBM, and MCI formed Advanced Network and Services (ANS), a new non-profit corporation with a more broadly based Board of Directors than the Michigan-based Merit Network. Under its cooperative agreement with NSF, Merit remained ultimately responsible for the operation of NSFNET, but subcontracted much of the engineering and operations work to ANS. In 1991 the NSFNET backbone service was expanded to additional sites and upgraded to a more robust 45 Mbit/s (T3) based network. The new T3 backbone was named ANSNet and provided the physical infrastructure used by Merit to deliver the NSFNET Backbone Service. On April 30, 1995, the NSFNET project came to an end, when the NSFNET backbone service was decommissioned and replaced by a new Internet architecture with commercial Internet service providers (ISPs) interconnected at Network Access Points provided by multiple providers across the country. === Bringing the Internet to Michigan: 1985 to 2001 === During the 1980s, Merit Network grew to serve eight member universities, with Oakland University joining in 1985 and Central Michigan University, Eastern Michigan University, and Michigan Technological University joining in 1987. In 1990, Merit's board of directors formally changed the organization's name to Merit Network, Inc., and created the name MichNet to refer to Merit's statewide network. The board also approved a staff proposal to allow organizations other than publicly supported universities, referred to as aff

    Read more →
  • Data preservation

    Data preservation

    Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. Data can be described as the elements or units in which knowledge and information is created, and metadata are the summarizing subsets of the elements of data; or the data about the data. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data. == History == Most historical data collected over time has been lost or destroyed. War and natural disasters combined with the lack of materials and necessary practices to preserve and protect data has caused this. Usually, only the most important data sets were saved, such as government records and statistics, legal contracts and economic transactions. Scientific research and doctoral theses data have mostly been destroyed from improper storage and lack of data preservation awareness and execution. Over time, data preservation has evolved and has generated importance and awareness. We now have many different ways to preserve data and many different important organizations involved in doing so. The first digital data preservation storage solutions appeared in the 1950s, which were usually flat or hierarchically structured. While there were still issues with these solutions, it made storing data much cheaper, and more easily accessible. In the 1970s relational databases as well as spreadsheets appeared. Relational data bases structure data into tables using structured query languages which made them more efficient than the preceding storage solutions, and spreadsheets hold high volumes of numeric data which can be applied to these relational databases to produce derivative data. More recently, non-relational (non-structured query language) databases have appeared as complements to relational databases which hold high volumes of unstructured or semi-structured data. == Importance == The scope of data preservation is vast. Everything from governmental to business records to art essentially can be represented as data, and is amenable to be lost. This then leads to loss of human history, for perpetuity. Data can be lost on a small or independent scale whether it's personal data loss, or data loss within businesses and organizations, as well as on a larger or national or global scale which can negatively and potentially permanently affect things such as environmental protection, medical research, homeland security, public health and safety, economic development and culture. The mechanisms of data loss are also as many as they are varied, spanning from disaster, wars, data breaches, negligence, all the way through simple forgetting to natural decay. Ways in which data collections can be used when preserved and stored properly can be seen through the U.S. Geological Survey, which stores data collections on natural hazards, natural resources, and landscapes. The data collected by the Survey is used by federal and state land management agencies towards land use planning and management, and continually needs access to historical reference data. == Related Concepts == In contrast, data holdings are collections of gathered data that are informally kept, and not necessarily prepared for long-term preservation. For example, a collection or back-up of personal files. Data holdings are generally the storage methods used in the past when data has been lost due to environmental and other historical disasters. Furthermore, data retention differs from data preservation in the sense that by definition, to retain an object (data) is to hold or keep possession or use of the object. To preserve an object is to protect, maintain and keep up for future use. Retention policies often circle around when data should be deleted on purpose as well, and held from public access, while preservation prioritizes permanence and more widely shared access. Thus, data preservation exceeds the concept of having or possessing data or back up copies of data. Data preservation ensures reliable access to data by including back-up and recovery mechanisms that precede the event of a disaster or technological change. == Methods == === Digital === Digital preservation, is similar to data preservation, but is mainly concerned with technological threats, and solely digital data. Essentially digital data is a set of formal activities to enable ongoing or persistent use and access of digital data exceeding the occurrence of technological malfunction or change. Digital preservation is aware of the inevitable change in technology and protocols, and prepares for data that will need to be accessible across new types of technologies and platforms while the integrity of the data and metadata are being conserved. Technology, while providing great process in conserving data that may not have been possible in the past, is also changing at such a quick rate that digital data may not be accessible anymore due to the format being incompatible with new software. Without the use of data preservation much of our existing digital data is at risk. The majority of methods used towards data preservation today are digital methods, which are so far the most effective methods that exist. === Archives === Archives are a collection of historical documents and records. Archives contribute and work towards the preservation of data by collecting data that is well organized, while providing the appropriate metadata to confirm it. An example of an important data archive is The LONI Image Data Archive, which is an archive that collects data regarding clinical trials and clinical research studies. === Catalogues, directories and portals === Catalogues, directories and portals are consolidated resources which are kept by individual institutions, and are associated with data archives and holdings. In other words, the data is not presented on the site, but instead might act as metadata and aggregators, and may administer thorough inventories. === Repositories === Repositories are places where data archives and holdings can be accessed and stored. The goal of repositories is to make sure that all requirements and protocols of archives and holdings are being met, and data is being certified to ensure data integrity and user trust. Single-site Repositories A repository that holds all data sets on a single site. An example of a major single-site repository the Data Archiving and Networking Services which is a repository which provides ongoing access to digital research resources for the Netherlands. Multi-Site Repositories A repository that hosts data set on multiple institutional sites. An example of a well known multi-site repository is OpenAIRE which is a repository that hosts research data and publications collaborating all of the EU countries and more. OpenAIRE promotes open scholarship and seeks to improves discover-ability and re-usability of data. Trusted Digital Repository A repository that seeks to provide reliable, trusted access over a long period of time. The repository can be single or multi-sited but must cooperate with the Reference Model for an Open Archival Information System, as well as adhere to a set of rules or attributes that contribute to its trust such as having persistent financial responsibility, organizational buoyancy, administrative responsibility security and safety. An example of a trusted digital repository is The Digital Repository of Ireland (DRI) which is a multi-site repository that hosts Ireland's humanity and social science data sets. === Cyber Infrastructures === Cyber infrastructures which consists of archive collections which are made available through the system of hardware, technologies, software, policies, services and tools. Cyber infrastructures are geared towards the sharing of data supporting peer-to-peer collaborations and a cultural community. An example of a major cyber-infrastructure is The Canadian Geo-spatial Data Infrastructure which provides access to spatial data in Canada.

    Read more →
  • Tableau de Concordance

    Tableau de Concordance

    The Tableau de Concordance was the main French diplomatic code used during World War I; the term also refers to any message sent using the code. It was a superenciphered four-digit code that was changed three times between 1 August 1914 and 15 January 1915. The Tableau de Concordance is considered superenciphered because there is more than one step required to use it. First, each word in a message is replaced by four digits via a codebook. These four digits are divided into three groups (one digit, two digits, one digit) so that when the whole message has been translated into code, the four-digit sets can be put together so it looks like the entire message is made up of two-digit pairs. This is called a "Straddle Gimmick." Then, in turn, each of these two digit pairs (and the single digits at the beginning and end) are replaced by two letters. The letters are then combined with no spaces for the final ciphertext. The manual for the Tableau de Concordance included the instruction that if there was not adequate time for completely enciphering the message, it should simply be sent in clear, because a partially enciphered message would have provided insight into the inner workings of the code.

    Read more →
  • BioBIKE

    BioBIKE

    BioBike(nee. BioLingua ) is a cloud-based, through-the-web programmable (Paas) symbolic biocomputing and bioinformatics platform that aims to make computational biology, and especially intelligent biocomputing (that is, the application of Artificial Intelligence to computational biology) accessible to research scientists who are not expert programmers. == Unique capabilities == BioBIKE is an integrated symbolic biocomputing and bioinformatics platform, built from the start as an entirely (what is now called) cloud-based architecture where all computing is done in remote servers, and all user access is accomplished through web browsers. BioBIKE has a built-in frame system in which all objects, data, and knowledge are represented. This enables code written either in the native Lisp, in the visual programming language, or systems of rules expressed in the SNARK theorem prover to access the whole of biological knowledge in an integrated manner. For its time (released in 2002) it was unique in permitting users to create fully functional biocomputing programs that run on the back-end servers entirely through the web browser UI. (In modern terms it was one of the first PaaS (Platform as a Service) systems, predating even Salesforce in this capability.) Initially this programming was carried out in raw Lisp, but Jeff Elhai's team at VCU, with NSF funding, created an entirely graphical programming environment on top of BioBIKE based upon the Boxer-style programming environments. Being a multi-headed, multi-threaded, multi-user, multi-tenancy cloud-based system, BioBIKE users were able to directly work together through their web browsers, remotely sharing the same listener and memory space. This permitted a unique sort of collaboration, discussed in Shrager (2007). A specialized offshoot of BioBIKE called "BioDeducta" includes SRI's SNARK theorem prover, offering unique "deductive biocomputing" capabilities. == Implementation == BioBIKE is open-source software implemented using the Lisp programming language. Continuing development takes place by the BioBIKE team centered at Virginia Commonwealth University . == History == BioBIKE was originally called "BioLingua", and was developed by Jeff Shrager at The Carnegie Inst. of Washington Dept. of Plant Biology, and JP Massar with funding from NASA's Astrobiology Division. Shrager and Massar wanted to create a web-based, multi-user Lisp Machine, specialized for bioinformatics. Other early contributors to the project included Mike Travers, and Jeff Elhai of VCU. Elhai obtained continuing funding from the National Science Foundation for the project, which was renamed BioBIKE. Elhai and colleagues added BioBIKE's unique visual programming language. Shrager, meanwhile, collaborated with Richard Waldinger at SRI to build SRI's (SNARK) theorem prover into BioBIKE, creating a deductive biocomputing system, called BioDeducta. == Instances == There used to be a number of BioBIKE verticals in different biological domains, including viral pathogens, cyanobacteria and other bacteria, Arabidopsis thaliana, and several others described in the references.

    Read more →
  • Data profiling

    Data profiling

    Data profiling is the process of examining the data available from an existing information source (e.g. a database or a file) and collecting statistics or informative summaries about that data. The purpose of these statistics may be to: Find out whether existing data can be easily used for other purposes Improve the ability to search data by tagging it with keywords, descriptions, or assigning it to a category Assess data quality, including whether the data conforms to particular standards or patterns Assess the risk involved in integrating data in new applications, including the challenges of joins Discover metadata of the source database, including value patterns and distributions, key candidates, foreign-key candidates, and functional dependencies Assess whether known metadata accurately describes the actual values in the source database Understanding data challenges early in any data intensive project, so that late project surprises are avoided. Finding data problems late in the project can lead to delays and cost overruns. Have an enterprise view of all data, for uses such as master data management, where key data is needed, or data governance for improving data quality. == Introduction == Data profiling refers to the analysis of information for use in a data warehouse in order to clarify the structure, content, relationships, and derivation rules of the data. Profiling helps to not only understand anomalies and assess data quality, but also to discover, register, and assess enterprise metadata. The result of the analysis is used to determine the suitability of the candidate source systems, usually giving the basis for an early go/no-go decision, and also to identify problems for later solution design. == How data profiling is conducted == Data profiling utilizes methods of descriptive statistics such as minimum, maximum, mean, mode, percentile, standard deviation, frequency, variation, aggregates such as count and sum, and additional metadata information obtained during data profiling such as data type, length, discrete values, uniqueness, occurrence of null values, typical string patterns, and abstract type recognition. The metadata can then be used to discover problems such as illegal values, misspellings, missing values, varying value representation, and duplicates. Different analyses are performed for different structural levels. E.g. single columns could be profiled individually to get an understanding of frequency distribution of different values, type, and use of each column. Embedded value dependencies can be exposed in a cross-columns analysis. Finally, overlapping value sets possibly representing foreign key relationships between entities can be explored in an inter-table analysis. Normally, purpose-built tools are used for data profiling to ease the process. The computational complexity increases when going from single column, to single table, to cross-table structural profiling. Therefore, performance is an evaluation criterion for profiling tools. == When is data profiling conducted? == According to Kimball, data profiling is performed several times and with varying intensity throughout the data warehouse developing process. A light profiling assessment should be undertaken immediately after candidate source systems have been identified and DW/BI business requirements have been satisfied. The purpose of this initial analysis is to clarify at an early stage if the correct data is available at the appropriate detail level and that anomalies can be handled subsequently. If this is not the case the project may be terminated. Additionally, more in-depth profiling is done prior to the dimensional modeling process in order assess what is required to convert data into a dimensional model. Detailed profiling extends into the ETL system design process in order to determine the appropriate data to extract and which filters to apply to the data set. Additionally, data profiling may be conducted in the data warehouse development process after data has been loaded into staging, the data marts, etc. Conducting data at these stages helps ensure that data cleaning and transformations have been done correctly and in compliance of requirements. == Benefits and examples == Data profiling can improve data quality, shorten the implementation cycle of major projects, and improve users' understanding of data. Discovering business knowledge embedded in data itself is one of the significant benefits derived from data profiling. It can improve data accuracy in corporate databases.

    Read more →
  • Locally recoverable code

    Locally recoverable code

    Locally recoverable codes are a family of error correction codes that were introduced first by D. S. Papailiopoulos and A. G. Dimakis and have been widely studied in information theory due to their applications related to distributive and cloud storage systems. An [ n , k , d , r ] q {\displaystyle [n,k,d,r]_{q}} LRC is an [ n , k , d ] q {\displaystyle [n,k,d]_{q}} linear code such that there is a function f i {\displaystyle f_{i}} that takes as input i {\displaystyle i} and a set of r {\displaystyle r} other coordinates of a codeword c = ( c 1 , … , c n ) ∈ C {\displaystyle c=(c_{1},\ldots ,c_{n})\in C} different from c i {\displaystyle c_{i}} , and outputs c i {\displaystyle c_{i}} . == Overview == Erasure-correcting codes, or simply erasure codes, for distributed and cloud storage systems, are becoming more and more popular as a result of the present spike in demand for cloud computing and storage services. This has inspired researchers in the fields of information and coding theory to investigate new facets of codes that are specifically suited for use with storage systems. It is well-known that LRC is a code that needs only a limited set of other symbols to be accessed in order to restore every symbol in a codeword. This idea is very important for distributed and cloud storage systems since the most common error case is when one storage node fails (erasure). The main objective is to recover as much data as possible from the fewest additional storage nodes in order to restore the node. Hence, Locally Recoverable Codes are crucial for such systems. The following definition of the LRC follows from the description above: an [ n , k , r ] {\displaystyle [n,k,r]} -Locally Recoverable Code (LRC) of length n {\displaystyle n} is a code that produces an n {\displaystyle n} -symbol codeword from k {\displaystyle k} information symbols, and for any symbol of the codeword, there exist at most r {\displaystyle r} other symbols such that the value of the symbol can be recovered from them. The locality parameter satisfies 1 ≤ r ≤ k {\displaystyle 1\leq r\leq k} because the entire codeword can be found by accessing k {\displaystyle k} symbols other than the erased symbol. Furthermore, Locally Recoverable Codes, having the minimum distance d {\displaystyle d} , can recover d − 1 {\displaystyle d-1} erasures. == Definition == Let C {\displaystyle C} be a [ n , k , d ] q {\displaystyle [n,k,d]_{q}} linear code. For i ∈ { 1 , … , n } {\displaystyle i\in \{1,\ldots ,n\}} , let us denote by r i {\displaystyle r_{i}} the minimum number of other coordinates we have to look at to recover an erasure in coordinate i {\displaystyle i} . The number r i {\displaystyle r_{i}} is said to be the locality of the i {\displaystyle i} -th coordinate of the code. The locality of the code is defined as An [ n , k , d , r ] q {\displaystyle [n,k,d,r]_{q}} locally recoverable code (LRC) is an [ n , k , d ] q {\displaystyle [n,k,d]_{q}} linear code C ∈ F q n {\displaystyle C\in \mathbb {F} _{q}^{n}} with locality r {\displaystyle r} . Let C {\displaystyle C} be an [ n , k , d ] q {\displaystyle [n,k,d]_{q}} -locally recoverable code. Then an erased component can be recovered linearly, i.e. for every i ∈ { 1 , … , n } {\displaystyle i\in \{1,\ldots ,n\}} , the space of linear equations of the code contains elements of the form x i = f ( x i 1 , … , x i r ) {\displaystyle x_{i}=f(x_{i_{1}},\ldots ,x_{i_{r}})} , where i j ≠ i {\displaystyle i_{j}\neq i} . == Optimal locally recoverable codes == Theorem Let n = ( r + 1 ) s {\displaystyle n=(r+1)s} and let C {\displaystyle C} be an [ n , k , d ] q {\displaystyle [n,k,d]_{q}} -locally recoverable code having s {\displaystyle s} disjoint locality sets of size r + 1 {\displaystyle r+1} . Then An [ n , k , d , r ] q {\displaystyle [n,k,d,r]_{q}} -LRC C {\displaystyle C} is said to be optimal if the minimum distance of C {\displaystyle C} satisfies == Tamo–Barg codes == Let f ∈ F q [ x ] {\displaystyle f\in \mathbb {F} _{q}[x]} be a polynomial and let ℓ {\displaystyle \ell } be a positive integer. Then f {\displaystyle f} is said to be ( r {\displaystyle r} , ℓ {\displaystyle \ell } )-good if • f {\displaystyle f} has degree r + 1 {\displaystyle r+1} , • there exist distinct subsets A 1 , … , A ℓ {\displaystyle A_{1},\ldots ,A_{\ell }} of F q {\displaystyle \mathbb {F} _{q}} such that – for any i ∈ { 1 , … , ℓ } {\displaystyle i\in \{1,\ldots ,\ell \}} , f ( A i ) = { t i } {\displaystyle f(A_{i})=\{t_{i}\}} for some t i ∈ F q {\displaystyle t_{i}\in \mathbb {F} _{q}} , i.e., f {\displaystyle f} is constant on A i {\displaystyle A_{i}} , – # A i = r + 1 {\displaystyle \#A_{i}=r+1} , – A i ∩ A j = ∅ {\displaystyle A_{i}\cap A_{j}=\varnothing } for any i ≠ j {\displaystyle i\neq j} . We say that { A 1 , … , A ℓ {\displaystyle A_{1},\ldots ,A_{\ell }} } is a splitting covering for f {\displaystyle f} . === Tamo–Barg construction === The Tamo–Barg construction utilizes good polynomials. • Suppose that a ( r , ℓ ) {\displaystyle (r,\ell )} -good polynomial f ( x ) {\displaystyle f(x)} over F q {\displaystyle \mathbb {F} _{q}} is given with splitting covering i ∈ { 1 , … , ℓ } {\displaystyle i\in \{1,\ldots ,\ell \}} . • Let s ≤ ℓ − 1 {\displaystyle s\leq \ell -1} be a positive integer. • Consider the following F q {\displaystyle \mathbb {F} _{q}} -vector space of polynomials V = { ∑ i = 0 s g i ( x ) f ( x ) i : deg ⁡ ( g i ( x ) ) ≤ deg ⁡ ( f ( x ) ) − 2 } . {\displaystyle V=\left\{\sum _{i=0}^{s}g_{i}(x)f(x)^{i}:\deg(g_{i}(x))\leq \deg(f(x))-2\right\}.} • Let T = ⋃ i = 1 ℓ A i {\textstyle T=\bigcup _{i=1}^{\ell }A_{i}} . • The code { ev T ⁡ ( g ) : g ∈ V } {\displaystyle \{\operatorname {ev} _{T}(g):g\in V\}} is an ( ( r + 1 ) ℓ , ( s + 1 ) r , d , r ) {\displaystyle ((r+1)\ell ,(s+1)r,d,r)} -optimal locally coverable code, where ev T {\displaystyle \operatorname {ev} _{T}} denotes evaluation of g {\displaystyle g} at all points in the set T {\displaystyle T} . === Parameters of Tamo–Barg codes === • Length. The length is the number of evaluation points. Because the sets A i {\displaystyle A_{i}} are disjoint for i ∈ { 1 , … , ℓ } {\displaystyle i\in \{1,\ldots ,\ell \}} , the length of the code is | T | = ( r + 1 ) ℓ {\displaystyle |T|=(r+1)\ell } . • Dimension. The dimension of the code is ( s + 1 ) r {\displaystyle (s+1)r} , for s {\displaystyle s} ≤ ℓ − 1 {\displaystyle \ell -1} , as each g i {\displaystyle g_{i}} has degree at most deg ⁡ ( f ( x ) ) − 2 {\displaystyle \deg(f(x))-2} , covering a vector space of dimension deg ⁡ ( f ( x ) ) − 1 = r {\displaystyle \deg(f(x))-1=r} , and by the construction of V {\displaystyle V} , there are s + 1 {\displaystyle s+1} distinct g i {\displaystyle g_{i}} . • Distance. The distance is given by the fact that V ⊆ F q [ x ] ≤ k {\displaystyle V\subseteq \mathbb {F} _{q}[x]_{\leq k}} , where k = r + 1 − 2 + s ( r + 1 ) {\displaystyle k=r+1-2+s(r+1)} , and the obtained code is the Reed-Solomon code of degree at most k {\displaystyle k} , so the minimum distance equals ( r + 1 ) ℓ − ( ( r + 1 ) − 2 + s ( r + 1 ) ) {\displaystyle (r+1)\ell -((r+1)-2+s(r+1))} . • Locality. After the erasure of the single component, the evaluation at a i ∈ A i {\displaystyle a_{i}\in A_{i}} , where | A i | = r + 1 {\displaystyle |A_{i}|=r+1} , is unknown, but the evaluations for all other a ∈ A i {\displaystyle a\in A_{i}} are known, so at most r {\displaystyle r} evaluations are needed to uniquely determine the erased component, which gives us the locality of r {\displaystyle r} . To see this, g {\displaystyle g} restricted to A j {\displaystyle A_{j}} can be described by a polynomial h {\displaystyle h} of degree at most deg ⁡ ( f ( x ) ) − 2 = r + 1 − 2 = r − 1 {\displaystyle \deg(f(x))-2=r+1-2=r-1} thanks to the form of the elements in V {\displaystyle V} (i.e., thanks to the fact that f {\displaystyle f} is constant on A j {\displaystyle A_{j}} , and the g i {\displaystyle g_{i}} 's have degree at most deg ⁡ ( f ( x ) ) − 2 {\displaystyle \deg(f(x))-2} ). On the other hand | A j ∖ { a j } | = r {\displaystyle |A_{j}\backslash \{a_{j}\}|=r} , and r {\displaystyle r} evaluations uniquely determine a polynomial of degree r − 1 {\displaystyle r-1} . Therefore h {\displaystyle h} can be constructed and evaluated at a j {\displaystyle a_{j}} to recover g ( a j ) {\displaystyle g(a_{j})} . === Example of Tamo–Barg construction === We will use x 5 ∈ F 41 [ x ] {\displaystyle x^{5}\in \mathbb {F} _{41}[x]} to construct [ 15 , 8 , 6 , 4 ] {\displaystyle [15,8,6,4]} -LRC. Notice that the degree of this polynomial is 5, and it is constant on A i {\displaystyle A_{i}} for i ∈ { 1 , … , 8 } {\displaystyle i\in \{1,\ldots ,8\}} , where A 1 = { 1 , 10 , 16 , 18 , 37 } {\displaystyle A_{1}=\{1,10,16,18,37\}} , A 2 = 2 A 1 {\displaystyle A_{2}=2A_{1}} , A 3 = 3 A 1 {\displaystyle A_{3}=3A_{1}} , A 4 = 4 A 1 {\displaystyle A_{4}=4A_{1}} , A 5 = 5 A 1 {\displaystyle A_{5}=5A_{1}} , A 6 = 6 A 1 {\displaystyle A_{6}=6A_{1}}

    Read more →
  • Social collaboration

    Social collaboration

    Social collaboration refers to processes that help multiple people or groups interact and share information to achieve common goals. Such processes find their 'natural' environment on the Internet, where collaboration and social dissemination of information are made easier by current innovations and the proliferation of the web. Sharing concepts on a digital collaboration environment often facilitates a "brainstorming" process, where new ideas may emerge due to the varied contributions of individuals. These individuals may hail from different walks of life, different cultures and different age groups, their diverse thought processes help in adding new dimensions to ideas, dimensions that previously may have been missed. A crucial concept behind social collaboration is that 'ideas are everywhere.' Individuals are able to share their ideas in an unrestricted environment as anyone can get involved and the discussion is not limited to only those who have domain knowledge. Social collaboration is also known as enterprise social networking, and the products to support it are often branded enterprise social networks (ESNs). It is important that we understand the rhythm of social collaboration. There needs to be a balance, with ease to move from focused solitary work to brainstorming for problem solving in group work. This critical balance can be achieved by creating structures or a work environment where it is not too rigid to prevent brainstorming in group work nor too loose to result in total chaos. Social collaboration should happen at the edge of chaos. Work practices should support social collaboration. The most effective environment is one that supports opportunistic planning. Opportunistic planning provides a general plan but then gives enough room for flexibility to change activities and tasks until the last moment. This way, people are able to cope up with unforeseen developments and not throwing away everything with one grand plan. == Comparison to social networking == Social collaboration is related to social networking, with the distinction that while social networking is individual-centric, social collaboration is entirely group-centric. Generally speaking, social networking means socializing for personal, professional or entertainment purposes, for example, LinkedIn and Facebook. Social collaboration, on the other hand, means working socially to achieve a common goal, for example, GitHub and Quora. Social networking services generally focus on individuals sharing messages in a more-or-less undirected way and receiving messages from many sources into a single personalized activity feed. Social collaboration services, on the other hand, focus on the identification of groups and collaboration spaces in which messages are explicitly directed at the group and the group activity feed is seen the same way by everyone. Social collaboration may refer to time-bound collaborations with an explicit goal to be completed or perpetual collaborations in which the goal is knowledge sharing (e.g. community of practice, online community). == Comparison to crowdsourcing == Social collaboration is similar to crowdsourcing as it involves individuals working together towards a common goal. Crowdsourcing is a method for harnessing specific information from a large, diverse group of people. Unlike social collaboration, which involves much communication and cooperation among a large group of people, crowdsourcing is more like individuals working towards the common goal relatively independently. Therefore, the process of working involves less communication. Andrea Grover, curator of a crowdsourcing art show, explained that collaboration among individuals is an appealing experience, because participation is "a low investment, with the possibility of a high return." == Social collaboration software == Notable social collaboration software includes Glip messaging, Google Apps, Knowledge Plaza Electronic Document System and Social Intranet, Microsoft Lync social collaboration tool for businesses, Slack, Weekdone for managers, and Wrike. == Future == Social collaboration is going to be used as a tool in companies to enhance productivity. Social workers could be able to use social collaboration tools to manage personal tasks, professional projects and social networks with other colleagues within the same organization. Social collaboration will serve as a platform to get people involved and connected. This kind of platform provides a spiritual training practice for social workers. Social collaboration software could help enhance the communication between customers and employees and build trust in the organization. When we need real-time chat, it would be excellent to include every participant in a shared and archived forum which keeps a record of important information and logs. So collaborators need not worry about losing important records while working towards the common goal. The interactive communication and synchronous environment promote understanding among colleagues. Collaboration helps in building strong relationships between workers, which in turn leads to faster problem solving. The close connection between workers and customers creates a scalable organization which naturally increases the trust and faith that customers have in the company. Therefore, the interactive customer relationship levels up customer satisfaction in ways that traditional collaboration methods cannot. Apart from its effect on the way work will be conducted in the future, social collaboration will also affect society. In the coming years social collaboration will be the driving force in societal change as more and more people work together to get their vision across to governments and governing agencies. An example of this is Change.org, an online petition tool where users can help bring their government's attention to pressing social issues that need to be addressed.

    Read more →
  • Pixel

    Pixel

    In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable physical element of a raster image or the smallest controllable element of a display device or dot matrix printer. Pixels are arranged in a regular, two-dimensional grid, and each pixel serves as a sample of an original image, with a greater number of samples typically providing more accurate representations. Each pixel possesses a specific intensity or color, often composed of three or four component intensities, such as red, green, and blue (RGB), or cyan, magenta, yellow, and black (CMYK). The intensity of each pixel is variable, and in color imaging systems, these components are combined to produce a wide spectrum of colors. The concept of a picture element has existed since the early days of television, appearing as "Bildpunkt" in a 1888 German patent, and the term "pixel" has been used in various U.S. patents since 1911. In most digital display devices, pixels are the smallest element that can be manipulated through software. Each pixel is a sample of an original image; more samples typically provide more accurate representations of the original. The intensity of each pixel is variable. In color imaging systems, a color is typically represented by three or four component intensities such as red, green, and blue, or cyan, magenta, yellow, and black. In some contexts (such as descriptions of camera sensors), pixel refers to a single scalar element of a multi-component representation (called a photosite in the camera sensor context, although sensel 'sensor element' is sometimes used), while in yet other contexts (like MRI) it may refer to a set of component intensities for a spatial position. Software on early consumer computers was necessarily rendered at a low resolution, with large pixels visible to the naked eye; graphics made under these limitations may be called pixel art, especially in reference to video games. Modern computers and displays, however, can easily render orders of magnitude more pixels than was previously possible, necessitating the use of large measurements like the megapixel (one million pixels). == Etymology == The word pixel is a combination of pix (from "pictures", shortened to "pics") and el (for "element"); similar formations with 'el' include the words voxel 'volume pixel', and texel 'texture pixel'. The word pix appeared in Variety magazine headlines in 1932, as an abbreviation for the word pictures, in reference to movies. By 1938, "pix" was being used in reference to still pictures by photojournalists. The word "pixel" was first published in 1965 by Frederic C. Billingsley of JPL, to describe the picture elements of scanned images from space probes to the Moon and Mars. Billingsley had learned the word from Keith E. McFarland, at the Link Division of General Precision in Palo Alto, who in turn said he did not know where it originated. McFarland said simply it was "in use at the time" (c. 1963). The concept of a "picture element" dates to the earliest days of television, for example as "Bildpunkt" (the German word for pixel, literally 'picture point') in the 1888 German patent of Paul Nipkow. According to various etymologies, the earliest publication of the term picture element itself was in Wireless World magazine in 1927, though it had been used earlier in various U.S. patents filed as early as 1911. Some authors explain pixel as picture cell, as early as 1972. In graphics and in image and video processing, pel is often used instead of pixel. For example, IBM used it in their Technical Reference for the original PC. Pixilation, spelled with a second i, is an unrelated filmmaking technique that dates to the beginnings of cinema, in which live actors are posed frame by frame and photographed to create stop-motion animation. An archaic British word meaning "possession by spirits (pixies)", the term has been used to describe the animation process since the early 1950s; various animators, including Norman McLaren and Grant Munro, are credited with popularizing it. == Technical == A pixel is generally thought of as the smallest single component of a digital image. However, the definition is highly context-sensitive. For example, there can be "printed pixels" in a page, or pixels carried by electronic signals, or represented by digital values, or pixels on a display device, or pixels in a digital camera (photosensor elements). This list is not exhaustive and, depending on context, synonyms include pel, sample, byte, bit, dot, and spot. Pixels can be used as a unit of measure such as: 2400 pixels per inch, 640 pixels per line, or spaced 10 pixels apart. The measures "dots per inch" (dpi) and "pixels per inch" (ppi) are sometimes used interchangeably, but have distinct meanings, especially for printer devices, where dpi is a measure of the printer's density of dot (e.g. ink droplet) placement. For example, a high-quality photographic image may be printed with 600 ppi on a 1200 dpi inkjet printer. Even higher dpi numbers, such as the 4800 dpi quoted by printer manufacturers since 2002, do not mean much in terms of achievable resolution. The more pixels used to represent an image, the closer the result can resemble the original. The number of pixels in an image is sometimes called the resolution, though resolution has a more specific definition. Pixel counts can be expressed as a single number, as in a "three-megapixel" digital camera, which has a nominal three million pixels, or as a pair of numbers, as in a "640 by 480 display", which has 640 pixels from side to side and 480 from top to bottom (as in a VGA display) and therefore has a total number of 640 × 480 = 307,200 pixels, or 0.3 megapixels. The pixels, or color samples, that form a digitized image (such as a JPEG file used on a web page) may or may not be in one-to-one correspondence with screen pixels, depending on how a computer displays an image. In computing, an image composed of pixels is known as a bitmapped image or a raster image. The word raster originates from television scanning patterns, and has been widely used to describe similar halftone printing and storage techniques. === Sampling patterns === For convenience, pixels are normally arranged in a regular two-dimensional grid. By using this arrangement, many common operations can be implemented by uniformly applying the same operation to each pixel independently. Other arrangements of pixels are possible, with some sampling patterns even changing the shape (or kernel) of each pixel across the image. For this reason, care must be taken when acquiring an image on one device and displaying it on another, or when converting image data from one pixel format to another. For example: Liquid-crystal displays (LCDs) typically use a staggered grid, where the red, green, and blue components are sampled at slightly different locations. Subpixel rendering is a technology which takes advantage of these differences to improve the rendering of text on LCD screens. The vast majority of color digital cameras use a Bayer filter, resulting in a regular grid of pixels where the color of each pixel depends on its position on the grid. A clipmap uses a hierarchical sampling pattern, where the size of the support of each pixel depends on its location within the hierarchy. Warped grids are used when the underlying geometry is non-planar, such as images of the earth from space. The use of non-uniform grids is an active research area, attempting to bypass the traditional Nyquist limit. Pixels on computer monitors are normally "square" (that is, have equal horizontal and vertical sampling pitch); pixels in other systems are often "rectangular" (that is, have unequal horizontal and vertical sampling pitch – oblong in shape), as are digital video formats with diverse aspect ratios, such as the anamorphic widescreen formats of the Rec. 601 digital video standard. === Resolution of computer monitors === Computer monitors (and TV sets) generally have a fixed native resolution. What it is depends on the monitor, and size. See below for historical exceptions. Computers can use pixels to display an image, often an abstract image that represents a GUI. The resolution of this image is called the display resolution and is determined by the video card of the computer. Flat-panel monitors (and TV sets), e.g. OLED or LCD monitors, or E-ink, also use pixels to display an image, and have a native resolution, and it should (ideally) be matched to the video card resolution. Each pixel is made up of triads, with the number of these triads determining the native resolution. On older, historically available, CRT monitors the resolution was possibly adjustable (still lower than what modern monitor achieve), while on some such monitors (or TV sets) the beam sweep rate was fixed, resulting in a fixed native resolution. Most CRT monitors do not have a fixed beam sweep rate, meaning they do not have a native resolution at all – instead they

    Read more →
  • Cryptographic nonce

    Cryptographic nonce

    In cryptography, a nonce is an arbitrary number that can be used just once in a cryptographic communication. It is often a random or pseudo-random number issued in an authentication protocol to ensure that each communication session is unique, and therefore that old communications cannot be reused in replay attacks. Nonces can also be useful as initialization vectors and in cryptographic hash functions. == Definition == A nonce is an arbitrary number used only once in a cryptographic communication, in the spirit of a nonce word. They are often random or pseudo-random numbers. Many nonces also include a timestamp to ensure exact timeliness, though this requires clock synchronisation between organisations. The addition of a client nonce ("cnonce") helps to improve the security in some ways as implemented in digest access authentication. To ensure that a nonce is used only once, it should be time-variant (including a suitably fine-grained timestamp in its value), or generated with enough random bits to ensure an insignificantly low chance of repeating a previously generated value. Some authors define pseudo-randomness (or unpredictability) as a requirement for a nonce. Nonce is a word dating back to Middle English for something only used once or temporarily (often with the construction "for the nonce"). It descends from the construction "then anes" ("the one [purpose]"). A false etymology claiming it to stand for "number used once" or similar is incorrect. == Usage == === Authentication === Authentication protocols may use nonces to ensure that old communications cannot be reused in replay attacks. For instance, nonces are used in HTTP digest access authentication to calculate an MD5 digest of the password. The nonces are different each time the 401 authentication challenge response code is presented, thus making replay attacks virtually impossible. The scenario of ordering products over the Internet can provide an example of the usefulness of nonces in replay attacks. An attacker could take the encrypted information and—without needing to decrypt—could continue to send a particular order to the supplier, thereby ordering products over and over again under the same name and purchase information. The nonce is used to give 'originality' to a given message so that if the company receives any other orders from the same person with the same nonce, it will discard those as invalid orders. A nonce may be used to ensure security for a stream cipher. Where the same key is used for more than one message and then a different nonce is used to ensure that the keystream is different for different messages encrypted with that key; often the message number is used. Secret nonce values are used by the Lamport signature scheme as a signer-side secret which can be selectively revealed for comparison to public hashes for signature creation and verification. === Hashing === Nonces are used in proof-of-work systems to vary the input to a cryptographic hash function so as to obtain a hash for a certain input that fulfils certain arbitrary conditions. In doing so, it becomes far more difficult to create a "desirable" hash than to verify it, shifting the burden of work onto one side of a transaction or system. For example, proof of work, using hash functions, was considered as a means to combat email spam by forcing email senders to find a hash value for the email (which included a timestamp to prevent pre-computation of useful hashes for later use) that had an arbitrary number of leading zeroes, by hashing the same input with a large number of values until a "desirable" hash was obtained. Similarly, the Bitcoin blockchain hashing algorithm can be tuned to an arbitrary difficulty by changing the required minimum/maximum value of the hash so that the number of bitcoins awarded for new blocks does not increase linearly with increased network computation power as new users join. This is likewise achieved by forcing Bitcoin miners to add nonce values to the value being hashed to change the hash algorithm output. As cryptographic hash algorithms cannot easily be predicted based on their inputs, this makes the act of blockchain hashing and the possibility of being awarded bitcoins something of a lottery, where the first "miner" to find a nonce that delivers a desirable hash is awarded bitcoins.

    Read more →
  • Data hub

    Data hub

    A data hub is a center of data exchange that is supported by data science, data engineering, and data warehouse technologies to interact with endpoints such as applications and algorithms. == Features == A data hub differs from a data warehouse in that it is generally unintegrated and often at different grains. It differs from an operational data store because a data hub does not need to be limited to operational data. A data hub differs from a data lake by homogenizing data and possibly serving data in multiple desired formats, rather than simply storing it in one place, and by adding other value to the data such as de-duplication, quality, security, and a standardized set of query services. A data lake tends to store data in one place for availability, and allow/require the consumer to process or add value to the data. Data hubs are ideally the "go-to" place for data within an enterprise, so that many point-to-point connections between callers and data suppliers do not need to be made, and so that the data hub organization can negotiate deliverables and schedules with various data enclave teams, rather than being an organizational free-for-all as different teams try to get new services and features from many other teams.

    Read more →