Sentence extraction

Sentence extraction

Sentence extraction is a technique used for automatic summarization of a text. In this shallow approach, statistical heuristics are used to identify the most salient sentences of a text. Sentence extraction is a low-cost approach compared to more knowledge-intensive deeper approaches which require additional knowledge bases such as ontologies or linguistic knowledge. In short, sentence extraction works as a filter that allows only meaningful sentences to pass. The major downside of applying sentence-extraction techniques to the task of summarization is the loss of coherence in the resulting summary. Nevertheless, sentence extraction summaries can give valuable clues to the main points of a document and are frequently sufficiently intelligible to human readers. == Procedure == Usually, a combination of heuristics is used to determine the most important sentences within the document. Each heuristic assigns a (positive or negative) score to the sentence. After all heuristics have been applied, the highest-scoring sentences are included in the summary. The individual heuristics are weighted according to their importance. === Early approaches and some sample heuristics === Seminal papers which laid the foundations for many techniques used today have been published by Hans Peter Luhn in 1958 and H. P Edmundson in 1969. Luhn proposed to assign more weight to sentences at the beginning of the document or a paragraph. Edmundson stressed the importance of title-words for summarization and was the first to employ stop-lists in order to filter uninformative words of low semantic content (e.g. most grammatical words such as of, the, a). He also distinguished between bonus words and stigma words, i.e. words that probably occur together with important (e.g. the word form significant) or unimportant information. His idea of using key-words, i.e. words which occur significantly frequently in the document, is still one of the core heuristics of today's summarizers. With large linguistic corpora available today, the tf–idf value which originated in information retrieval, can be successfully applied to identify the key words of a text: If for example the word cat occurs significantly more often in the text to be summarized (TF = "term frequency") than in the corpus (IDF means "inverse document frequency"; here the corpus is meant by document), then cat is likely to be an important word of the text; the text may in fact be a text about cats.

Image-based modeling and rendering

In computer graphics and computer vision, image-based modeling and rendering (IBMR) methods rely on a set of two-dimensional images of a scene to generate a three-dimensional model and then render some novel views of this scene. The traditional approach of computer graphics has been used to create a geometric model in 3D and try to reproject it onto a two-dimensional image. Computer vision, conversely, is mostly focused on detecting, grouping, and extracting features (edges, faces, etc.) present in a given picture and then trying to interpret them as three-dimensional clues. Image-based modeling and rendering allows the use of multiple two-dimensional images in order to generate directly novel two-dimensional images, skipping the manual modeling stage. == Light modeling == Instead of considering only the physical model of a solid, IBMR methods usually focus more on light modeling. The fundamental concept behind IBMR is the plenoptic illumination function which is a parametrisation of the light field. The plenoptic function describes the light rays contained in a given volume. It can be represented with seven dimensions: a ray is defined by its position ( x , y , z ) {\displaystyle (x,y,z)} , its orientation ( θ , ϕ ) {\displaystyle (\theta ,\phi )} , its wavelength ( λ ) {\displaystyle (\lambda )} and its time ( t ) {\displaystyle (t)} : P ( x , y , z , θ , ϕ , λ , t ) {\displaystyle P(x,y,z,\theta ,\phi ,\lambda ,t)} . IBMR methods try to approximate the plenoptic function to render a novel set of two-dimensional images from another. Given the high dimensionality of this function, practical methods place constraints on the parameters in order to reduce this number (typically to 2 to 4). == IBMR methods and algorithms == View morphing generates a transition between images Panoramic imaging renders panoramas using image mosaics of individual still images Lumigraph relies on a dense sampling of a scene Space carving generates a 3D model based on a photo-consistency check

G.9972

G.9972 (also known as G.cx) is a Recommendation developed by ITU-T that specifies a coexistence mechanism for networking transceivers capable of operating over electrical power line wiring. It allows G.hn devices to coexist with other devices implementing G.9972 and operating on the same power line wiring. G.9972 received consent during the meeting of ITU-T Study Group 15, on October 9, 2009, and final approval on June 11, 2010. G.9972 specifies two mechanisms for coexistence between G.hn home networks and broadband over power lines (BPL) Internet access networks: Frequency-division multiplexing (FDM), in which the available spectrum is divided into two parts: frequencies below 10 or 14 MHz (specific value can be selected by the access network) are reserved for the access network, while frequencies above them are reserved for the in-home network. Time-division multiplexing (TDM), in which the available channel time is split equally between both networks. 50% of time slots are allocated for the access network, and 50% are allocated to the in-home network.

Clustered file system

A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for each node). Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance. == Shared-disk file system == A shared-disk file system uses a storage area network (SAN) to allow multiple computers to gain direct disk access at the block level. Access control and translation from file-level operations that applications use to block-level operations used by the SAN must take place on the client node. The most common type of clustered file system, the shared-disk file system – by adding mechanisms for concurrency control – provides a consistent and serializable view of the file system, avoiding corruption and unintended data loss even when multiple clients try to access the same files at the same time. Shared-disk file-systems commonly employ some sort of fencing mechanism to prevent data corruption in case of node failures, because an unfenced device can cause data corruption if it loses communication with its sister nodes and tries to access the same information other nodes are accessing. The underlying storage area network may use any of a number of block-level protocols, including SCSI, iSCSI, HyperSCSI, ATA over Ethernet (AoE), Fibre Channel, network block device, and InfiniBand. There are different architectural approaches to a shared-disk filesystem. Some distribute file information across all the servers in a cluster (fully distributed). === Examples === == Distributed file systems == Distributed file systems do not share block level access to the same storage but use a network protocol. These are commonly known as network file systems, even though they are not the only file systems that use the network to send data. Distributed file systems can restrict access to the file system depending on access lists or capabilities on both the servers and the clients, depending on how the protocol is designed. The difference between a distributed file system and a distributed data store is that a distributed file system allows files to be accessed using the same interfaces and semantics as local files – for example, mounting/unmounting, listing directories, read/write at byte boundaries, system's native permission model. Distributed data stores, by contrast, require using a different API or library and have different semantics (most often those of a database). === Design goals === Distributed file systems may aim for "transparency" in a number of aspects. That is, they aim to be "invisible" to client programs, which "see" a system which is similar to a local file system. Behind the scenes, the distributed file system handles locating files, transporting data, and potentially providing other features listed below. Access transparency: clients are unaware that files are distributed and can access them in the same way as local files are accessed. Location transparency: a consistent namespace exists encompassing local as well as remote files. The name of a file does not give its location. Concurrency transparency: all clients have the same view of the state of the file system. This means that if one process is modifying a file, any other processes on the same system or remote systems that are accessing the files will see the modifications in a coherent manner. Failure transparency: the client and client programs should operate correctly after a server failure. Heterogeneity: file service should be provided across different hardware and operating system platforms. Scalability: the file system should work well in small environments (1 machine, a dozen machines) and also scale gracefully to bigger ones (hundreds through tens of thousands of systems). Replication transparency: Clients should not have to be aware of the file replication performed across multiple servers to support scalability. Migration transparency: files should be able to move between different servers without the client's knowledge. === History === The Incompatible Timesharing System used virtual devices for transparent inter-machine file system access in the 1960s. More file servers were developed in the 1970s. In 1976, Digital Equipment Corporation created the File Access Listener (FAL), an implementation of the Data Access Protocol as part of DECnet Phase II which became the first widely used network file system. In 1984, Sun Microsystems created the file system called "Network File System" (NFS) which became the first widely used Internet Protocol based network file system. Other notable network file systems are Andrew File System (AFS), Apple Filing Protocol (AFP), NetWare Core Protocol (NCP), and Server Message Block (SMB) which is also known as Common Internet File System (CIFS). In 1986, IBM announced client and server support for Distributed Data Management Architecture (DDM) for the System/36, System/38, and IBM mainframe computers running CICS. This was followed by the support for IBM Personal Computer, AS/400, IBM mainframe computers under the MVS and VSE operating systems, and FlexOS. DDM also became the foundation for Distributed Relational Database Architecture, also known as DRDA. There are many peer-to-peer network protocols for open-source distributed file systems for cloud or closed-source clustered file systems, e. g.: 9P, AFS, Coda, CIFS/SMB, DCE/DFS, WekaFS, Lustre, PanFS, Google File System, Mnet, Chord Project. === Examples === == Network-attached storage == Network-attached storage (NAS) provides both storage and a file system, like a shared disk file system on top of a storage area network (SAN). NAS typically uses file-based protocols (as opposed to block-based protocols a SAN would use) such as NFS (popular on UNIX systems), SMB/CIFS (Server Message Block/Common Internet File System) (used with MS Windows systems), AFP (used with Apple Macintosh computers), or NCP (used with OES and Novell NetWare). == Design considerations == === Avoiding single point of failure === The failure of disk hardware or a given storage node in a cluster can create a single point of failure that can result in data loss or unavailability. Fault tolerance and high availability can be provided through data replication of one sort or another, so that data remains intact and available despite the failure of any single piece of equipment. For examples, see the lists of distributed fault-tolerant file systems and distributed parallel fault-tolerant file systems. === Performance === A common performance measurement of a clustered file system is the amount of time needed to satisfy service requests. In conventional systems, this time consists of a disk-access time and a small amount of CPU-processing time. But in a clustered file system, a remote access has additional overhead due to the distributed structure. This includes the time to deliver the request to a server, the time to deliver the response to the client, and for each direction, a CPU overhead of running the communication protocol software. === Concurrency === Concurrency control becomes an issue when more than one person or client is accessing the same file or block and want to update it. Hence updates to the file from one client should not interfere with access and updates from other clients. This problem is more complex with file systems due to concurrent overlapping writes, where different writers write to overlapping regions of the file concurrently. This problem is usually handled by concurrency control or locking which may either be built into the file system or provided by an add-on protocol. == History == IBM mainframes in the 1970s could share physical disks and file systems if each machine had its own channel connection to the drives' control units. In the 1980s, Digital Equipment Corporation's TOPS-20 and OpenVMS clusters (VAX/ALPHA/IA64) included shared disk file systems.

Data exhaust

Data exhaust (also exhaust data) is the trail of data generated as a by-product of users' online activity, behaviour, and transactions, rather than data they deliberately create or submit. It forms part of a broader category of unconventional data that also includes geospatial, network, and time-series data, and may be useful for predictive analytics. Data exhaust can take the form of cookies, temporary files, log files, clickstream records and stored preferences. Actions such as visiting a web page, following a link, or dwelling on an element may all generate exhaust data that is recorded without the user's active awareness. Unlike primary content — which the user intentionally creates — exhaust data is a passive side effect of interaction. A bank, for example, might treat the amounts and parties involved in a transaction as primary data, while secondary data could include whether the transaction was carried out at a cash machine rather than a branch. == Uses == Data exhaust collected by companies is often information that is not immediately useful in isolation, but can be aggregated and analysed to improve products, personalise content, identify trends, and support quality control. Companies may also store exhaust data for future analysis or sell it to third parties. Shoshana Zuboff has described this practice as a core mechanism of what she terms surveillance capitalism, in which behavioural data generated by users is converted into predictive products. Kosciejew notes that large quantities of often raw data are collected in this way, much of which is never analysed. == Medical exhaust data == Many medical devices — including pacemakers, dialysis machines and surgical cameras — generate exhaust data as a by-product of their operation. The majority of this data is never captured or analysed, and is typically discarded once a procedure ends or a device completes its routine monitoring cycle. The potential use of data generated by implanted devices such as pacemakers raises additional legal and ethical questions around ownership and consent. Using electronic health records for research also creates challenges because of the volume of data involved, creating a need for automated algorithms to process it. == Privacy and regulation == The collection and distribution of data exhaust is not in itself illegal in most jurisdictions, but its use raises questions of privacy and informed consent. Steps commonly taken to address these concerns include data anonymisation, offering users an opt-out from the sale of their data, and publishing explicit privacy policies that disclose what data is collected and how it is used.

Record sealing

Record sealing is the process of making public records inaccessible to the public. In many cases, a person with a sealed record gains the legal right to deny or not acknowledge anything to do with the arrest and the legal proceedings from the case itself. Records are commonly sealed in a number of situations: Sealed birth records (typically after adoption or determination of paternity) Juvenile criminal records may be sealed Other types of cases involving juveniles may be sealed, anonymized, or pseudonymized ("impounded"); e.g., child sex offense or custody cases Cases using witness protection information may be partly sealed Cases involving trade secrets Cases involving state secrets == Filing under seal in US court == Normally, records should not be filed under seal without a court permission. However, FRCP 5.2 requires that sensitive text – like Social Security number, Taxpayer Identification Number, birthday, bank accounts, and children’s names – should be redacted off the filings made with the court and accompanying exhibits. A person making a redacted filing can file an unredacted copy under seal, or the Court can choose to order later that an additional filing be made under seal without redaction. Alternately, the filing party may ask the court’s permission to file some exhibits completely under seal. When the document is filed "under seal", it should have a clear indication for the court clerk to file it separately – most often by stamping words "Filed Under Seal" on the bottom of each page. Person making filing should also provide instructions to the court clerk that the document needs to be filed "under seal". Courts often have specific requirements to these filings in their Local Rules. == Difference from expungement == Expungement, which is a physical destruction, namely a complete erasure of one's criminal records, and therefore usually carries a higher standard, differs from record sealing, which is only to restrict the public's access to records, so that only certain law enforcement agencies or courts, under special circumstances, will have access to them. A record seal will greatly improve the chance of employment, as employers will not have access to damning records. There are occasions, like expungement, where one can truthfully state under oath that they have never been convicted before. Most of the time, a record seal has more relaxed requirements than an expungement. If an expungement is not allowed with a case, then sealing a record may be the best bet. Different states have different terms for what constitutes sealing of a record. == Cybersecurity incidents involving sealed records == Several cybersecurity incidents have demonstrated that sealed court documents are not always secure in practice, with vulnerabilities and data breaches exposing sensitive information. In January 2021, following the SolarWinds cyber attack, the U.S. Bankruptcy Court United States District Court for the District of Nevada announced that its Case Management/Electronic Case Files CM/ECF system had been potentially compromised. The judiciary stated that additional safeguards were being implemented to protect filings, and that the review of the incident and its impact was ongoing. Reports noted that the breach raised concerns about exposure of highly sensitive and sealed documents submitted through the CM/ECF system. In 2023, security researcher Jason Parker, following a tip from an activist, identified flaws in online court systems that exposed sealed records including confidential testimony and medical records through publicly accessible portals. In 2024, a cyber intrusion targeting attorneys in a civil case involving Representative Matt Gaetz led to the unauthorized access and leak of sealed depositions and related records. The breach exposed confidential testimony and financial records, some of which were later reported by news outlets, raising concerns about the security of electronically stored legal materials and the handling of sealed filings. In 2025, multiple reports confirmed that the federal judiciary's CM/ECF and PACER (law) filing system was compromised, exposing sealed indictments, confidential informant information, and other sensitive filings. Some courts temporarily reverted to paper-based filing to mitigate the risks of further disclosure. The FBI later confirmed that the breach had exposed sealed records, and investigators suspected foreign state actors were involved. == GAO publications referencing sealed records == Closed Criminal Plea and Sentencing Proceedings (1983) – Reviewed Department of Justice policies on closing plea and sentencing hearings. GAO noted that sealed transcripts should be unsealed once the reasons for closure no longer applied. Information on Plea Agreements and Settlements in Defense Procurement Fraud Cases (1992) – Examined outcomes of procurement fraud prosecutions. GAO observed that in some instances the results were sealed from public access. Military Recruiting: More Needs to Be Done to Better Screen Applicants and Detect Fraud (1999) – Investigated fraudulent enlistments in the armed forces. The report highlighted that sealed juvenile records often prevented recruiters from discovering prior offenses. Social Security Numbers: Governments Could Do More to Reduce Display in Public Records (2004) – Analyzed risks associated with SSN availability in state and local records. GAO pointed out that some categories of records, such as adoption proceedings, were sealed and less likely to expose identifiers. Social Security Numbers: Stronger Safeguards Needed to Protect Privacy (2005 testimony) – Testimony before Congress reiterating concerns over SSN exposure in public records, while noting that sealed categories (e.g., adoption) were exceptions. U.S. Supreme Court: Policies and Perspectives on Video and Audio Coverage of Appellate Court Proceedings (2016) – Surveyed appellate court policies on courtroom media coverage. The report acknowledged distinctions between public filings, confidential submissions, and sealed materials. Evictions: National Data Are Limited and Challenging to Collect (2024) – Examined nationwide eviction data. GAO reported that in some states eviction records may be sealed or expunged, limiting researchers' ability to compile datasets. DOD Fraud Risk Management: Enhanced Data and Collaboration Could Improve Efforts (2024) – Reviewed Department of Defense fraud-risk management. GAO noted that some adjudicative records in its dataset were sealed, restricting completeness of oversight data.

PitchYaGame

PitchYaGame or #PitchYaGame (sometimes abbreviated to PYG) is a volunteer movement hosted on the social media platform Twitter to showcase, and present awards for, independent video games from around the world. == Description == PitchYaGame is hosted on the social media platform Twitter to showcase independent video games from around the world. Video pitches are presented by developers in June and November each year, and use the hashtag #PitchYaGame to identify and reference news about the showcase and the individual pitches, and the presentation of awards. The showcase was founded in May 2020 by Liam Twose, with the mission of recognising independent video games, and "focused on empowering indie game developers to strengthen their position in the industry." Twose has made clear that PitchYaGame is a showcase and not a hardcore competition, with "[j]ust enough of a push to make sure people put their best pitch forward." The team now comprises Twose (@LiamTwose at Twitter), operations manager "Indie Game Lover" (@IndieGameLover), and host Sarah Clancy (@ImSarahNow). The pitches were originally made monthly, with entries split into a number of categories, but this proved unmanageable. PitchYaGame collaborator, Sarah Clancy reported that judging the many entries on a monthly basis was "difficult and unwieldy." Therefore, pitches were later switched to six monthly, "feature creep" was reduced, and awards streamlined into gold, silver, bronze, runners-up, and most viral. == Sponsorship == In June 2021, PitchYaGame prizes were sponsored by Xsolla, and in November 2021 by Aurora Punks and Cold Pixel. No cash prizes were available in 2022, as the organisers moved PitchYaGame into a less-competitive, "more showcase centric format". == Reception == In October 2020, Elijah Beahm at The Escapist wrote that "One of the greatest challenges for any game is landing a solid pitch. You have to sell people, maybe even a publisher, to take your idea seriously. Most of the time, it's an obfuscated process that leaves the average developer scratching their heads, but Liam Twose and his team behind #PitchYaGame, 'PYG' for short, are looking to change all that with some clever social engineering." In March 2021, Cameron Koch at GameSpot wrote that "Using the #PitchYaGame, thousands of indie developers tweeted out pitches for their games on November 2 as part of a social media contest, and the results are astounding." He went on to say that "There is no arguing with the results. According to Twose, around 1100-1300 games were shared with the hashtag, and some real gems look to have shined through." In November 2021, Stafano "Stef" Castelli at IGN Italia wrote that "I myself enjoyed 'browsing through' the competitors, discovering a handful of intriguing video games in development." (translated from Italian). In November 2022, Eric Bartelson at Premortem Games wrote that "It's a great way to get games noticed by fellow developers, but also publishers, investors and press." In June 2023, Mark Plunkett in Kotaku wrote about the impossibility of keeping up with all the video game releases, and described PitchYaGame, which has attracted over 10,000 pitches since 2020, as an "astoundingly simple idea" that has "become an increasingly useful spot to catch up on some excellent-looking games that we may have otherwise completely slept on."