AI Chatbot Soulmate

AI Chatbot Soulmate — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Noisy text analytics

    Noisy text analytics

    Noisy text analytics is a process of information extraction whose goal is to automatically extract structured or semistructured information from noisy unstructured text data. While Text analytics is a growing and mature field that has great value because of the huge amounts of data being produced, processing of noisy text is gaining in importance because a lot of common applications produce noisy text data. Noisy unstructured text data is found in informal settings such as online chat, text messages, e-mails, message boards, newsgroups, blogs, wikis and web pages. Also, text produced by processing spontaneous speech using automatic speech recognition and printed or handwritten text using optical character recognition contains processing noise. Text produced under such circumstances is typically highly noisy containing spelling errors, abbreviations, non-standard words, false starts, repetitions, missing punctuations, missing letter case information, pause filling words such as “um” and “uh” and other texting and speech disfluencies. Such text can be seen in large amounts in contact centers, chat rooms, optical character recognition (OCR) of text documents, short message service (SMS) text, etc. Documents with historical language can also be considered noisy with respect to today's knowledge about the language. Such text contains important historical, religious, ancient medical knowledge that is useful. The nature of the noisy text produced in all these contexts warrants moving beyond traditional text analysis techniques. == Techniques for noisy text analysis == Missing punctuation and the use of non-standard words can often hinder standard natural language processing tools such as part-of-speech tagging and parsing. Techniques to both learn from the noisy data and then to be able to process the noisy data are only now being developed. == Possible source of noisy text == World Wide Web: Poorly written text is found in web pages, online chat, blogs, wikis, discussion forums, newsgroups. Most of these data are unstructured and the style of writing is very different from, say, well-written news articles. Analysis for the web data is important because they are sources for market buzz analysis, market review, trend estimation, etc. Also, because of the large amount of data, it is necessary to find efficient methods of information extraction, classification, automatic summarization and analysis of these data. Contact centers: This is a general term for help desks, information lines and customer service centers operating in domains ranging from computer sales and support to mobile phones to apparels. On an average a person in the developed world interacts at least once a week with a contact center agent. A typical contact center agent handles over a hundred calls per day. They operate in various modes such as voice, online chat and E-mail. The contact center industry produces gigabytes of data in the form of E-mails, chat logs, voice conversation transcriptions, customer feedback, etc. A bulk of the contact center data is voice conversations. Transcription of these using state of the art automatic speech recognition results in text with 30-40% word error rate. Further, even written modes of communication like online chat between customers and agents and even the interactions over email tend to be noisy. Analysis of contact center data is essential for customer relationship management, customer satisfaction analysis, call modeling, customer profiling, agent profiling, etc., and it requires sophisticated techniques to handle poorly written text. Printed Documents: Many libraries, government organizations and national defence organizations have vast repositories of hard copy documents. To retrieve and process the content from such documents, they need to be processed using Optical Character Recognition. In addition to printed text, these documents may also contain handwritten annotations. OCRed text can be highly noisy depending on the font size, quality of the print etc. It can range from 2-3% word error rates to as high as 50-60% word error rates. Handwritten annotations can be particularly hard to decipher, and error rates can be quite high in their presence. Short Messaging Service (SMS): Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message length facilitating faster typing and the need for semantic clarity, shape the structure of this non-standard form known as the texting language.

    Read more →
  • Spotify Kids

    Spotify Kids

    Spotify Kids is a Swedish kid-friendly Music streaming service developed by Spotify. It offers curated content for children, including music, audiobooks, lullabies, and bedtime stories, while providing their parents with parental controls. The service is only available to subscribers to Spotify's Premium Family subscription plan. == Function == Spotify Kids is a Swedish Kid-friendly Music Streaming Service that allows children to browse Spotify with parental controls. Using the app, parents can view their children's listening history, block specific songs, and share playlists with their children. The app also includes sing-along songs, playlists designed for young children, and curated audiobooks, lullabies, and bedtime stories. Access is included in Spotify's Premium Family subscription plan, and is exclusive to subscribers to the plan. Users can configure the app for a specific age group upon first launch. The playlists on Spotify Kids are curated by groups including Discovery Kids, Nickelodeon, Universal Pictures, and The Walt Disney Company. All content on the Spotify Kids app is curated by editors. As of March 2021, there were roughly 8,000 songs available on the platform. The design of the Spotify Kids app is colorful, and user interface varies depending on the age group for which the app is configured. Spotify Kids is designed to comply with consent and data collection regulations for apps used by children. TechCrunch explains that it is "designed on a grand scale to drive subscriptions to Spotify's top-tier $14.99-per-month Premium Family Plan." == Release == After being beta tested in Ireland in October 2019, it was released as a beta across the United Kingdom on February 11, 2020. It was later released in Sweden, Denmark, Australia, New Zealand, Mexico, Argentina, and Brazil. On March 31, 2021, it was made available in France, Canada, and the United States.

    Read more →
  • List of security assessment tools

    List of security assessment tools

    This is a list of available software and hardware tools that are designed for or are particularly suited to various kinds of security assessment and security testing. == Operating systems and tool suites == Several operating systems and tool suites provide bundles of tools useful for various types of security assessment. === Operating system distributions === Kali Linux (formerly BackTrack), a penetration-test-focused Linux distribution based on Debian Pentoo, a penetration-test-focused Linux distribution based on Gentoo ParrotOS, a Linux distro focused on penetration testing, forensics, and online anonymity. == Tools ==

    Read more →
  • Certified social engineering prevention specialist

    Certified social engineering prevention specialist

    Certified Social Engineering Prevention Specialist (CSEPS) is a social engineering security-awareness training and professional certification program originally developed by Kevin Mitnick and Alexis Kasperavičius. == Course structure == The original CSEPS program was structured as a multi-module corporate security-awareness course designed to teach employees, managers, and IT personnel how social engineers manipulate human behavior to bypass technical security systems. The curriculum combined case studies, psychological analysis, attack demonstrations, pretexting exercises, and operational security scenarios. The course materials described social engineering as the exploitation of "the human factor" in information security and argued that traditional technical defenses alone were insufficient to protect organizations from deception-based attacks. The training program was divided into instructional modules covering topics such as: social engineering methodology and threat analysis intelligence gathering and reconnaissance dumpster diving pretexting elicitation technique telephone-system exploitation and caller-ID spoofing psychological influence techniques industrial espionage identity theft organizational vulnerabilities security policy development and employee awareness training The course also analyzed historical and contemporary case studies involving information theft, corporate espionage, fraudulent wire transfers, and telephone-based impersonation attacks. Training exercises required participants to analyze how attackers established credibility, manipulated trust, overcame objections, and exploited organizational procedures. According to The Wall Street Journal, CSEPS was delivered as a two-day "boot camp" course costing approximately US$1,500 per attendee. Clients reportedly included the United States Air Force and the United States Marine Corps. The certification examination included multiple-choice and written-response sections dealing with social-engineering defense scenarios and mitigation strategies. == History == In 2003, Mitnick and Kasperavičius partnered with the Florida-based IT training company Intense School Inc. to offer CSEPS classes throughout the United States. In 2020, Mitnick partnered with security-awareness training company KnowBe4, and elements of the original CSEPS material became incorporated into KnowBe4's social-engineering awareness training offerings.

    Read more →
  • Network Abstraction Layer

    Network Abstraction Layer

    The Network Abstraction Layer (NAL) is a part of the H.264/AVC and HEVC video coding standards. The main goal of the NAL is the provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "non conversational" (storage, broadcast, or streaming) applications. NAL has achieved a significant improvement in application flexibility relative to prior video coding standards. == Introduction == An increasing number of services and growing popularity of high definition TV are creating greater needs for higher coding efficiency. Moreover, other transmission media such as cable modem, xDSL, or UMTS offer much lower data rates than broadcast channels, and enhanced coding efficiency can enable the transmission of more video channels or higher quality video representations within existing digital transmission capacities. Video coding for telecommunication applications has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements. The H.264/AVC and HEVC standards are designed for technical solutions including areas like broadcasting (over cable, satellite, cable modem, DSL, terrestrial, etc.) interactive or serial storage on optical and magnetic devices, conversational services, video-on-demand or multimedia streaming, multimedia messaging services, etc. Moreover, new applications may be deployed over existing and future networks. This raises the question about how to handle this variety of applications and networks. To address this need for flexibility and customizability, the design covers a NAL that formats the Video Coding Layer (VCL) representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media. The NAL is designed in order to provide "network friendliness" to enable simple and effective customization of the use of VCL for a broad variety of systems. The NAL facilitates the ability to map VCL data to transport layers such as: RTP/IP for any kind of real-time wire-line and wireless Internet services. File formats, e.g., ISO MP4 for storage and MMS. H.32X for wireline and wireless conversational services. MPEG-2 systems for broadcasting services, etc. The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the video coding standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet formats uses of NAL units, parameter sets, and access units. A short description of these concepts is given below. == NAL units == The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each H.264/AVC NAL unit is a header byte that contains an indication of the type of data in the NAL unit. For HEVC the header was extended to two bytes. All the remaining bytes contain payload data of the type indicated by the header. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream. == NAL Units in Byte-Stream Format Use == Some systems require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC and HEVC specifications define a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit. A small amount of additional data (one byte per video picture) is also added to allow decoders that operate in systems that provide streams of bits without alignment to byte boundaries to recover the necessary alignment from the data in the stream. Additional data can also be inserted in the byte stream format that allows expansion of the amount of data to be sent and can aid in achieving more rapid byte alignment recovery, if desired. == NAL Units in Packet-Transport System Use == In other systems (e.g., IP/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes. == VCL and Non-VCL NAL Units == NAL units are classified into VCL and non-VCL NAL units. VCL NAL units contain the data that represents the values of the samples in the video pictures. Non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures). == Parameter Sets == A parameter set contains shared configuration data that is carried in non-VCL NAL units. Parameter sets are typically reused when decoding many coded pictures within a video sequence. Each VCL NAL unit references a picture parameter set (PPS), which in turn references a sequence parameter set (SPS). There are two types of parameter sets: Sequence parameter set (SPS), which specifies mostly constant configuration such as resolution, bit depth, or chroma format. (For a concrete implementation, see FFmpeg's SPS struct.) Picture parameter set (PPS), which applies on top of an SPS, and specifies configuration such as QP offsets. (For a concrete implementation, see FFmpeg's PPS struct.) The sequence and picture parameter-set mechanism decouples the transmission of infrequently changing information from the transmission of coded representations of the values of the samples in the video pictures. Each VCL NAL unit contains an identifier that refers to the content of the relevant picture parameter set and each picture parameter set contains an identifier that refers to the content of the relevant sequence parameter set. In this manner, a small amount of data (the identifier) can be used to refer to a larger amount of information (the parameter set) without repeating that information within each VCL NAL unit. Sequence and picture parameter sets can be sent well ahead of the VCL NAL units that they apply to, and can be repeated to provide robustness against data loss. In some applications, parameter sets may be sent within the channel that carries the VCL NAL units (termed "in-band" transmission). In other applications, it can be advantageous to convey the parameter sets "out-of-band" using a more reliable transport mechanism than the video channel itself. == Access Units == A set of NAL units in a specified form is referred to as an access unit. The decoding of each access unit results in one decoded picture. Each access unit contains a set of VCL NAL units that together compose a primary coded picture. It may also be prefixed with an access unit delimiter to aid in locating the start of the access unit. Some supplemental enhancement information containing data such as picture timing information may also precede the primary coded picture. The primary coded picture consists of a set of VCL NAL units consisting of slices or slice data partitions that represent the samples of the video picture. Following the primary coded picture may be some additional VCL NAL units that contain redundant representations of areas of the same video picture. These are referred to as redundant coded pictures, and are available for use by a decoder in recovering from loss or corruption of the data in the primary coded pictures. Decoders are not required to decode redundant coded pictures if they are present. Finally, if the coded picture is the last picture of a coded video sequence (a sequence of pictures that is independently decodable and uses only one sequence parameter set), an end of sequence NAL unit may be present to indicate the end of the sequence; and if the coded picture is the last coded picture in the entire NAL unit stream, an end of stream NAL unit may be present to

    Read more →
  • Security.txt

    Security.txt

    security.txt is an accepted standard for website security information that allows security researchers to report security vulnerabilities easily. The standard prescribes a text file named security.txt in the well known location, similar in syntax to robots.txt but intended to be machine and human readable, for those wishing to contact a website's owner about security issues. security.txt files have been adopted by Google, GitHub, LinkedIn, and Facebook. == History == The Internet Draft was first submitted by Edwin Foudil in September 2017. At that time it covered four directives, "Contact", "Encryption", "Disclosure" and "Acknowledgement". Foudil expected to add further directives based on feedback. In addition, web security expert Scott Helme said he had seen positive feedback from the security community while use among the top 1 million websites was "as low as expected right now". In 2019, the Cybersecurity and Infrastructure Security Agency (CISA) published a draft binding operational directive that requires all US federal agencies to publish a security.txt file within 180 days. The Internet Engineering Steering Group (IESG) issued a Last Call for security.txt in December 2019 which ended on January 6, 2020. A study in 2021 found that over ten percent of top-100 websites published a security.txt file, with the percentage of sites publishing the file decreasing as more websites were considered. The study also noted a number of discrepancies between the standard and the content of the file. In April 2022 the security.txt file has been accepted by Internet Engineering Task Force (IETF) as RFC 9116. == File format == security.txt files can be served under the /.well-known/ directory (i.e. /.well-known/security.txt) or the top-level directory (i.e. /security.txt) of a website. The file must be served over HTTPS and in plaintext format.

    Read more →
  • List of security assessment tools

    List of security assessment tools

    This is a list of available software and hardware tools that are designed for or are particularly suited to various kinds of security assessment and security testing. == Operating systems and tool suites == Several operating systems and tool suites provide bundles of tools useful for various types of security assessment. === Operating system distributions === Kali Linux (formerly BackTrack), a penetration-test-focused Linux distribution based on Debian Pentoo, a penetration-test-focused Linux distribution based on Gentoo ParrotOS, a Linux distro focused on penetration testing, forensics, and online anonymity. == Tools ==

    Read more →
  • Digistar

    Digistar

    Digistar is the first computer graphics-based planetarium projection and content system. It was designed by Evans & Sutherland and released in 1983. The technology originally focused on accurate and high quality display of stars, including for the first time showing stars from points of view other than Earth's surface, travelling through the stars, and accurately showing celestial bodies from different times in the past and future. Beginning with the Digistar 3 the system now projects full-dome video. == Projector == Unlike modern full-dome systems, which use LCD, DLP, SXRD, or laser projection technology, the Digistar projection system was designed for projecting bright pinpoints of light representing stars. This was accomplished using a calligraphic display, a form of vector graphics, rather than raster graphics. The heart of the Digistar projector is a large cathode-ray tube (CRT). A phosphor plate is mounted atop the tube, and light is then dispersed by a large lens with a 160 degree field of view to cover the planetarium dome. The original lens bore the inscription: "August 1979 mfg. by Lincoln Optical Corp., L.A., CA for Evans and Sutherland Computer Corp., SLC, UT, Digital planetarium CRT projection lens, 43mm, f2.8, 160 degree field of view". The coordinates of the stars and wire-frame models to be displayed by the projector were stored in computer RAM in a display list. The display would read each set of coordinates in turn and drive the CRT's electron beam directly to those coordinates. If the electron beam was enabled while being moved a line would be painted on the phosphor plate. Otherwise, the electron beam would be enabled once at its destination and a star would be painted. Once all coordinates in the display list had been processed, the display would repeat from the top of the display list. Thus, the shorter the display list the more frequently the electron beam would refresh the charge on a given point on the phosphor plate, making the projection of the points brighter. In this way, the stars projected by Digistar were substantially brighter than could be achieved using a raster display, which has to touch every point on the phosphor plate before repeating. Likewise, the calligraphic technology allowed Digistar to have a darker black-level than full-dome projectors, since the portions of the phosphor plate representing dark sky were never hit by the electron beam. As it is only one tube, with no pixelated color filter screen, the Digistar projector is monochromatic. The Digistar projects a bright, phosphorescent green, though many (including both visitors and planetarians) report they cannot distinguish between this green and white. Additionally, unlike a raster display, the calligraphic display is not discretized into pixels, so the displayed stars were a more realistic single spot of light, without the blocky or ropy artifacts that are hard to avoid with raster graphics. Due to the use of vector graphics, as opposed to raster imaging, the Digistar does not have the resolution issues that many full-dome systems have. Thanks to this, and the brightness of the CRT, only one projector is needed to project on the entire dome, whereas most full-dome systems require up to six raster projectors, depending on dome size. The projector in the original Digistar was housed in a square pyramid-shaped sheathing. When powered on, the four sides at the tip of the pyramid would recede into the housing, exposing the lens and appearing as a cut-off pyramid. As Digistar II was being developed, many planetaria were sold Digistar LEA projectors. The LEA, called Digistar 1.5 by many users, was effectively a prototype of the D2 projector, compatible with Digistar and upgradable to Digistar II. There are no significant differences in performance between the LEA and the true D2. == History == Digistar was the brainchild of Stephen McAllister and Brent Watson, both of whom were long-time amateur astronomers and computer graphics engineers. In 1977, E&S had been consulting with Johnson Space Center regarding training simulators for astronauts. McAllister had been writing proof-of-concept software for this consultation and in summer 1977 entered the data for 400 bright stars and wrote the software to display them. Steve and Brent both originally saw the system's purpose as celestial navigation training. Brent, who had until recently worked at Hansen planetarium, asked his planetarium coworkers what they thought of a potential digital planetarium system, and then Steve and Brent both targeted the system toward planetaria. The primary goal of the planetarium system was to use computer graphics to overcome the limitation of traditional star ball technology that only allowed display of star fields from the point of view of Earth's surface. By using computer graphics the stars could be displayed from viewpoints in space, including simulating the appearance of space flight. Likewise, planets and moons within the Solar System could be displayed accurately for any time in history, from any point of view. The system used the location of real stars from the Yale Bright Star Catalogue, as well as random stars. A laboratory prototype of Digistar was used to generate the star fields and tactical displays in the 1982 science fiction film Star Trek II: The Wrath of Khan. Filming was done directly from the Digistar display in the lab. ILM projected the effort would take two weeks, but in fact it took from late November 1981 until mid-February 1982. The last shot recorded was what became the first entirely computer generated feature film sequence. It was the opening scene of the film, a rotating forward translation through a star field that lasted 3.5 minutes. It was recorded in one take, at a rate of one frame every 3.5 seconds, taking four hours for the shoot. The Digistar team members are credited in the film. After prototyping in labs at Evans and Sutherland the team repeatedly used Salt Lake City's Hansen planetarium to beta test the system at the planetarium at night. The Digistar team performed one week of shows at the planetarium as a fund raiser to benefit the planetarium. The company also later gave the planetarium an improved prototype Digistar to replace "Jake", the planetarium's aging Spitz planetarium projector. The first customer installation was to the newly constructed Universe Planetarium at the Science Museum of Virginia in 1983, the largest planetarium dome in the world at the time, for $595,000. By September 1986 there were four installed Digistars. Even at this point the long-term success of the product was very much in doubt, but as of 2019 Digistar has an installed base of over 550 planetaria. === Versions === Digistar (1983) Digistar II (1995) Digistar 3 (2002) Digistar 4 (2010?) Digistar 5 (2012) Digistar 6 (2016) Digistar 7 (2021) == Hardware == Digistar was driven by a VAX-11/780 minicomputer, with custom graphics hardware related to the E&S Picture System 2. Later versions of Digistar 1 used a DEC MicroVAX 2, driving a custom version of a PS/300. The original Digistar and Digistar 2 had a physical control panel that was used for running the star shows. This control panel was approximately 3' x 4' and contained a keyboard, a 6 DOF joystick, and a large array of back-lit buttons. One button that was used for moving the viewpoint forward in space was labeled "Boldly Go". Later iterations of Digistar replaced the physical control panel with a common graphical user interface. Digistar 3 was the first Digistar system to offer full-dome video in 2002, using six projectors. Digistar 4 was able to cover the dome using only two projectors. == System limitations == Though technologically advanced in its day, and the closest system to true full-dome video at the time of its release, the original Digistar and Digistar 2 are limited to only projecting dots and lines—meaning only wireframe models can be projected. To compensate for this, the projector is capable of defocusing specific models, blurring lines and dots together. An example of this is in the Digistar 2's built-in Milky Way model. The model is a circle of parallel lines that, when defocused, appear as the continuous band of the Milky Way across the sky. On more complex models, especially three-dimensional ones, brightness and details may be lost in this process, so it is not useful in all situations. The Digistar and Digistar 2 also suffer focus limitations. Because they use a single lens to cover the entire dome, it is difficult to gain perfect focus across the dome. Coupled with this, stars greater than a certain brightness are "multihit" points, meaning the projector draws two dots at the given position to accommodate the brightness of the star. Errors in the projector can lead the second dot to be slightly out-of-place with the first one. These two issues together, along with other issues that can occur within the projector's focus system, give the stars a blobby look. Some p

    Read more →
  • Referring expression generation

    Referring expression generation

    Referring expression generation (REG) is the subtask of natural language generation (NLG) that received most scholarly attention. While NLG is concerned with the conversion of non-linguistic information into natural language, REG focuses only on the creation of referring expressions (noun phrases) that identify specific entities called targets. This task can be split into two sections. The content selection part determines which set of properties distinguish the intended target and the linguistic realization part defines how these properties are translated into natural language. A variety of algorithms have been developed in the NLG community to generate different types of referring expressions. == Types of referring expressions == A referring expression (RE), in linguistics, is any noun phrase, or surrogate for a noun phrase, whose function in discourse is to identify some individual object (thing, being, event...) The technical terminology for identify differs a great deal from one school of linguistics to another. The most widespread term is probably refer, and a thing identified is a referent, as for example in the work of John Lyons. In linguistics, the study of reference relations belongs to pragmatics, the study of language use, though it is also a matter of great interest to philosophers, especially those wishing to understand the nature of knowledge, perception and cognition more generally. Various devices can be used for reference: determiners, pronouns, proper names... Reference relations can be of different kinds; referents can be in a "real" or imaginary world, in discourse itself, and they may be singular, plural, or collective. === Pronouns === The simplest type of referring expressions are pronoun such as he and it. The linguistics and natural language processing communities have developed various models for predicting anaphor referents, such as centering theory, and ideally referring-expression generation would be based on such models. However most NLG systems use much simpler algorithms, for example using a pronoun if the referent was mentioned in the previous sentence (or sentential clause), and no other entity of the same gender was mentioned in this sentence. === Definite noun phrases === There has been a considerable amount of research on generating definite noun phrases, such as the big red book. Much of this builds on the model proposed by Dale and Reiter. This has been extended in various ways, for example Krahmer et al. present a graph-theoretic model of definite NP generation with many nice properties. In recent years a shared-task event has compared different algorithms for definite NP generation, using the TUNA corpus. === Spatial and temporal reference === Recently there has been more research on generating referring expressions for time and space. Such references tend to be imprecise (what is the exact meaning of tonight?), and also to be interpreted in different ways by different people. Hence it may be necessary to explicitly reason about false positive vs false negative tradeoffs, and even calculate the utility of different possible referring expressions in a particular task context. === Criteria for good expressions === Ideally, a good referring expression should satisfy a number of criteria: Referential success: It should unambiguously identify the referent to the reader. Ease of comprehension: The reader should be able to quickly read and understand it. Computational complexity: The generation algorithm should be fast No false inferences: The expression should not confuse or mislead the reader by suggesting false implicatures or other pragmatic inferences. For example, a reader may be confused if he is told Sit by the brown wooden table in a context where there is only one table. == History == === Pre-2000 era === REG goes back to the early days of NLG. One of the first approaches was done by Winograd in 1972 who developed an "incremental" REG algorithm for his SHRDLU program. Afterwards researchers started to model the human abilities to create referring expressions in the 1980s. This new approach to the topic was influenced by the researchers Appelt and Kronfeld who created the programs KAMP and BERTRAND and considered referring expressions as parts of bigger speech acts. Some of their most interesting findings were the fact that referring expressions can be used to add information beyond the identification of the referent as well as the influence of communicative context and the Gricean maxims on referring expressions. Furthermore, its skepticism concerning the naturalness of minimal descriptions made Appelt and Kronfeld's research a foundation of later work on REG. The search for simple, well-defined problems changed the direction of research in the early 1990s. This new approach was led by Dale and Reiter who stressed the identification of the referent as the central goal. Like Appelt they discuss the connection between the Gricean maxims and referring expressions in their culminant paper in which they also propose a formal problem definition. Furthermore, Reiter and Dale discuss the Full Brevity and Greedy Heuristics algorithms as well as their Incremental Algorithm(IA) which became one of the most important algorithms in REG. === Later developments === After 2000 the research began to lift some of the simplifying assumptions, that had been made in early REG research in order to create more simple algorithms. Different research groups concentrated on different limitations creating several expanded algorithms. Often these extend the IA in a single perspective for example in relation to: Reference to Sets like "the t-shirt wearers" or "the green apples and the banana on the left" Relational Descriptions like "the cup on the table" or "the woman who has three children" Context Dependency, Vagueness and Gradeability include statements like "the older man" or "the car on the left" which are often unclear without a context Salience and Generation of Pronouns are highly discourse dependent making for example "she" a reference to "the (most salient) female person" Many simplifying assumptions are still in place or have just begun to be worked on. Also a combination of the different extensions has yet to be done and is called a "non-trivial enterprise" by Krahmer and van Deemter. Another important change after 2000 was the increasing use of empirical studies in order to evaluate algorithms. This development took place due to the emergence of transparent corpora. Although there are still discussions about what the best evaluation metrics are, the use of experimental evaluation has already led to a better comparability of algorithms, a discussion about the goals of REG and more task-oriented research. Furthermore, research has extended its range to related topics such as the choice of Knowledge Representation(KR) Frameworks. In this area the main question, which KR framework is most suitable for the use in REG remains open. The answer to this question depends on how well descriptions can be expressed or found. A lot of the potential of KR frameworks has been left unused so far. Some of the different approaches are the usage of: Graph search which treats relations between targets in the same way as properties. Constraint Satisfaction which allows for a separation between problem specification and the implementation. Modern Knowledge Representation which offers logical inference in for example Description Logic or Conceptual Graphs. == Problem definition == Dale and Reiter (1995) think about referring expressions as distinguishing descriptions. They define: The referent as the entity that should be described The context set as set of salient entities The contrast set or potential distractors as all elements of the context set except the referent A property as a reference to a single attribute–value pair Each entity in the domain can be characterised as a set of attribute–value pairs for example ⟨ {\displaystyle \langle } type, dog ⟩ {\displaystyle \rangle } , ⟨ {\displaystyle \langle } gender, female ⟩ {\displaystyle \rangle } or ⟨ {\displaystyle \langle } age, 10 years ⟩ {\displaystyle \rangle } . The problem then is defined as follows: Let r {\displaystyle r} be the intended referent, and C {\displaystyle C} be the contrast set. Then, a set L {\displaystyle L} of attribute–value pairs will represent a distinguishing description if the following two conditions hold: Every attribute–value pair in L {\displaystyle L} applies to r {\displaystyle r} : that is, every element of L {\displaystyle L} specifies an attribute–value that r {\displaystyle r} possesses. For every member c {\displaystyle c} of C {\displaystyle C} , there is at least one element l {\displaystyle l} of L {\displaystyle L} that does not apply to c {\displaystyle c} : that is, there is an l {\displaystyle l} in L {\displaystyle L} that specifies an attribute–value that c {\displaystyle c} does not possess. l {\displaystyle l} is said

    Read more →
  • Space partitioning

    Space partitioning

    In geometry, space partitioning is the process of dividing an entire space (usually a Euclidean space) into two or more disjoint subsets (see also partition of a set). In other words, space partitioning divides a space into non-overlapping regions. Any point in the space can then be identified to lie in exactly one of the regions. == Overview == Space-partitioning systems are often hierarchical, meaning that a space (or a region of space) is divided into several regions, and then the same space-partitioning system is recursively applied to each of the regions thus created. The regions can be organized into a tree, called a space-partitioning tree. Most space-partitioning systems use planes (or, in higher dimensions, hyperplanes) to divide space: points on one side of the plane form one region, and points on the other side form another. Points exactly on the plane are usually arbitrarily assigned to one or the other side. Recursively partitioning space using planes in this way produces a BSP tree, one of the most common forms of space partitioning. == Uses == === In computer graphics === Space partitioning is particularly important in computer graphics, especially heavily used in ray tracing, where it is frequently used to organize the objects in a virtual scene. A typical scene may contain millions of polygons. Performing a ray/polygon intersection test with each would be a very computationally expensive task. Storing objects in a space-partitioning data structure (k-d tree or BSP tree for example) makes it easy and fast to perform certain kinds of geometry queries—for example in determining whether a ray intersects an object, space partitioning can reduce the number of intersection test to just a few per primary ray, yielding a logarithmic time complexity with respect to the number of polygons. Space partitioning is also often used in scanline algorithms to eliminate the polygons out of the camera's viewing frustum, limiting the number of polygons processed by the pipeline. There is also a usage in collision detection: determining whether two objects are close to each other can be much faster using space partitioning. === In integrated circuit design === In integrated circuit design, an important step is design rule check. This step ensures that the completed design is manufacturable. The check involves rules that specify widths and spacings and other geometry patterns. A modern design can have billions of polygons that represent wires and transistors. Efficient checking relies heavily on geometry query. For example, a rule may specify that any polygon must be at least n nanometers from any other polygon. This is converted into a geometry query by enlarging a polygon by n/2 at all sides and query to find all intersecting polygons. === In probability and statistical learning theory === The number of components in a space partition plays a central role in some results in probability theory. See Growth function for more details. === In geography and GIS === There are many studies and applications where Geographical Spatial Reality is partitioned by hydrological criteria, administrative criteria, mathematical criteria or many others. In the context of cartography and GIS - Geographic Information System, is common to identify cells of the partition by standard codes. For example the for HUC code identifying hydrographical basins and sub-basins, ISO 3166-2 codes identifying countries and its subdivisions, or arbitrary DGGs - discrete global grids identifying quadrants or locations. == Data structures == Common space-partitioning systems include: BSP trees Quadtrees Octrees k-d trees Bins == Number of components == Suppose the n-dimensional Euclidean space is partitioned by r {\displaystyle r} hyperplanes that are ( n − 1 ) {\displaystyle (n-1)} -dimensional. What is the number of components in the partition? The largest number of components is attained when the hyperplanes are in general position, i.e, no two are parallel and no three have the same intersection. Denote this maximum number of components by C o m p ( n , r ) {\displaystyle Comp(n,r)} . Then, the following recurrence relation holds: C o m p ( n , r ) = C o m p ( n , r − 1 ) + C o m p ( n − 1 , r − 1 ) {\displaystyle Comp(n,r)=Comp(n,r-1)+Comp(n-1,r-1)} C o m p ( 0 , r ) = 1 {\displaystyle Comp(0,r)=1} - when there are no dimensions, there is a single point. C o m p ( n , 0 ) = 1 {\displaystyle Comp(n,0)=1} - when there are no hyperplanes, all the space is a single component. And its solution is: C o m p ( n , r ) = ∑ k = 0 n ( r k ) {\displaystyle Comp(n,r)=\sum _{k=0}^{n}{r \choose k}} if r ≥ n {\displaystyle r\geq n} C o m p ( n , r ) = 2 r {\displaystyle Comp(n,r)=2^{r}} if r ≤ n {\displaystyle r\leq n} (consider e.g. r {\displaystyle r} perpendicular hyperplanes; each additional hyperplane divides each existing component to 2). which is upper-bounded as: C o m p ( n , r ) ≤ r n + 1 {\displaystyle Comp(n,r)\leq r^{n}+1}

    Read more →
  • Security.txt

    Security.txt

    security.txt is an accepted standard for website security information that allows security researchers to report security vulnerabilities easily. The standard prescribes a text file named security.txt in the well known location, similar in syntax to robots.txt but intended to be machine and human readable, for those wishing to contact a website's owner about security issues. security.txt files have been adopted by Google, GitHub, LinkedIn, and Facebook. == History == The Internet Draft was first submitted by Edwin Foudil in September 2017. At that time it covered four directives, "Contact", "Encryption", "Disclosure" and "Acknowledgement". Foudil expected to add further directives based on feedback. In addition, web security expert Scott Helme said he had seen positive feedback from the security community while use among the top 1 million websites was "as low as expected right now". In 2019, the Cybersecurity and Infrastructure Security Agency (CISA) published a draft binding operational directive that requires all US federal agencies to publish a security.txt file within 180 days. The Internet Engineering Steering Group (IESG) issued a Last Call for security.txt in December 2019 which ended on January 6, 2020. A study in 2021 found that over ten percent of top-100 websites published a security.txt file, with the percentage of sites publishing the file decreasing as more websites were considered. The study also noted a number of discrepancies between the standard and the content of the file. In April 2022 the security.txt file has been accepted by Internet Engineering Task Force (IETF) as RFC 9116. == File format == security.txt files can be served under the /.well-known/ directory (i.e. /.well-known/security.txt) or the top-level directory (i.e. /security.txt) of a website. The file must be served over HTTPS and in plaintext format.

    Read more →
  • Data event

    Data event

    A data event is a relevant state transition defined in an event schema. Typically, event schemata are described by pre- and post condition for a single or a set of data items. In contrast to ECA (Event condition action), which considers an event to be a signal, the data event not only refers to the change (signal), but describes specific state transitions, which are referred to in ECA as conditions. Considering data events as relevant data item state transitions allows defining complex event-reaction schemata for a database. Defining data event schemata for relational databases is limited to attribute and instance events. Object-oriented databases also support collection properties, which allows defining changes in collections as data events, too.

    Read more →
  • Automated negotiation

    Automated negotiation

    Automated negotiation is a form of interaction in systems that are composed of multiple autonomous agents, in which the aim is to reach agreements through an iterative process of making offers. Automated negotiation can be employed for many tasks human negotiators regularly engage in, such as bargaining and joint decision making. The main topics in automated negotiation revolve around the design of protocols and negotiating strategies. == History == Through digitization, the beginning of the 21st century has seen a growing interest in the automation of negotiation and e-negotiation systems, for example in the setting of e-commerce. This interest is fueled by the promise of automated agents being able to negotiate on behalf of human negotiators, and to find better outcomes than human negotiators. == Examples == Examples of automated negotiation include: Online dispute resolution, in which disagreements between parties are settled. Sponsored search auction, where bids are placed on advertisement keywords. Content negotiation, in which user agents negotiate over HTTP about how to best represent a web resource. Negotiation support systems, in which negotiation decision-making activities are supported by an information system.

    Read more →
  • Texture atlas

    Texture atlas

    In computer graphics, a texture atlas (also called a spritesheet or an image sprite in 2D game development) is an image containing multiple smaller images, usually packed together to reduce overall dimensions. An atlas can consist of uniformly-sized images or images of varying dimensions. A sub-image is drawn using custom texture coordinates to pick it out of the atlas. == Benefits == In an application where many small textures are used frequently, it is often more efficient to store the textures in a texture atlas which is treated as a single unit by the graphics hardware. This reduces both the disk I/O overhead and the overhead of a context switch by increasing memory locality. Careful alignment may be needed to avoid bleeding between sub textures when used with mipmapping and texture compression. In web development, images are packed into a sprite sheet to reduce the number of image resources that need to be fetched in order to display a page. == Gallery ==

    Read more →
  • Parkerian Hexad

    Parkerian Hexad

    The Parkerian Hexad is a set of six elements of information security proposed by Donn B. Parker in 1998. The Parkerian Hexad adds three additional attributes to the three classic security attributes of the CIA triad (confidentiality, integrity, availability). The Parkerian Hexad attributes are the following: Confidentiality Possession or Control Integrity Authenticity Availability Utility These attributes of information are atomic in that they are not broken down into further constituents; they are non-overlapping in that they refer to unique aspects of information. Any information security breach can be described as affecting one or more of these fundamental attributes of information. == Attributes from the CIA triad == === Confidentiality === Confidentiality refers to the "quality or state of being private or secret; known only to a limited few", or "the property that information is not made available or disclosed to unauthorized individuals, entities, or processes". For example: If an enterprise's strategic plans are leaked to competitors then this is a breach of confidentiality; If unauthorized persons gain access to an individual's financial records then that individual's confidentiality is breached. === Integrity === Integrity refers to being correct or consistent with the intended state of information. Any unauthorized modification of data, whether deliberate or accidental, is a breach of data integrity. For example: Data stored on disk are expected to be stable. If the data is changed at random by problems with a disk controller then this is a breach of integrity; Data generated by a medical device is transmitted and stored in the healthcare center but neither altered nor tampered with; Application programs are supposed to record information correctly. If the application introduces deviations from the intended values then this is a breach of integrity. "From Donn Parker: My definition of information integrity comes from the dictionaries. Integrity means that the information is whole, sound, and unimpaired (not necessarily correct). It means nothing is missing from the information it is complete and in intended good order". === Availability === Availability means having timely access to information. For example: A disk crash or denial-of-service attacks both cause a breach of availability. Any delay in response of a system that exceeds the expected service levels for that system can be described as a breach of availability. GPS jamming can lead to loss of Availability of the GPS system. == Parker's added attributes == === Authenticity === Authenticity is the "quality of being authentic or of established authority for truth and correctness". Parker defines it thus: "is the information genuine and accurate? Does it conform to reality and have validity?" and "authoritative, valid, true, real, genuine, or worthy of acceptance or belief by reason of conformity to fact and reality". === Possession or control === Possession or control refers to the loss of data by the authorized user (even if the ʺthiefʺ cannot access the data). From a control systems perspective, it is any loss of control (the ability to change settings and functions) or loss of view (the ability to monitor the system’s operation and its response to controls). Suppose a thief were to steal a sealed envelope containing a bank debit card and its personal identification number. Even if the thief did not open that envelope, it's reasonable for the victim to be concerned that the thief could do so at any time. That situation illustrates a loss of control or possession of information but does not involve the breach of confidentiality. === Utility === Utility refers to the data's usefulness. For example: Suppose someone encrypted data on disk to prevent unauthorized access or undetected modifications–and then lost the decryption key: that would be a breach of utility. The data would be confidential, controlled, integral, authentic, and available–they just wouldn't be useful in that form. The conversion of salary data from one currency into an inappropriate currency would be a breach of utility, as would the storage of data in a format inappropriate for a specific computer architecture; e.g., EBCDIC instead of ASCII or 9-track magnetic tape instead of DVD-ROM. A tabular representation of data substituted for a graph could be described as a breach of utility if the substitution made it more difficult to interpret the data. Utility is often confused with availability because breaches such as those described in these examples may also require time to work around the change in data format or presentation. However, the concept of usefulness is distinct from that of availability.

    Read more →