In machine learning, feature hashing, also known as the hashing trick (by analogy to the kernel trick), is a fast and space-efficient way of vectorizing features, i.e. turning arbitrary features into indices in a vector or matrix. It works by applying a hash function to the features and using their hash values as indices directly (after a modulo operation), rather than looking the indices up in an associative array. In addition to its use for encoding non-numeric values, feature hashing can also be used for dimensionality reduction. This trick is often attributed to Weinberger et al. (2009), but there exists a much earlier description of this method published by John Moody in 1989. == Motivation == === Motivating example === In a typical document classification task, the input to the machine learning algorithm (both during learning and classification) is free text. From this, a bag of words (BOW) representation is constructed: the individual tokens are extracted and counted, and each distinct token in the training set defines a feature (independent variable) of each of the documents in both the training and test sets. Machine learning algorithms, however, are typically defined in terms of numerical vectors. Therefore, the bags of words for a set of documents is regarded as a term-document matrix where each row is a single document, and each column is a single feature/word; the entry i, j in such a matrix captures the frequency (or weight) of the j'th term of the vocabulary in document i. (An alternative convention swaps the rows and columns of the matrix, but this difference is immaterial.) Typically, these vectors are extremely sparse—according to Zipf's law. The common approach is to construct, at learning time or prior to that, a dictionary representation of the vocabulary of the training set, and use that to map words to indices. Hash tables and tries are common candidates for dictionary implementation. E.g., the three documents John likes to watch movies. Mary likes movies too. John also likes football. can be converted, using the dictionary to the term-document matrix ( John likes to watch movies Mary too also football 1 1 1 1 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 0 0 1 1 ) {\displaystyle {\begin{pmatrix}{\textrm {John}}&{\textrm {likes}}&{\textrm {to}}&{\textrm {watch}}&{\textrm {movies}}&{\textrm {Mary}}&{\textrm {too}}&{\textrm {also}}&{\textrm {football}}\\1&1&1&1&1&0&0&0&0\\0&1&0&0&1&1&1&0&0\\1&1&0&0&0&0&0&1&1\end{pmatrix}}} (Punctuation was removed, as is usual in document classification and clustering.) The problem with this process is that such dictionaries take up a large amount of storage space and grow in size as the training set grows. On the contrary, if the vocabulary is kept fixed and not increased with a growing training set, an adversary may try to invent new words or misspellings that are not in the stored vocabulary so as to circumvent a machine learned filter. To address this challenge, Yahoo! Research attempted to use feature hashing for their spam filters. Note that the hashing trick isn't limited to text classification and similar tasks at the document level, but can be applied to any problem that involves large (perhaps unbounded) numbers of features. === Mathematical motivation === Mathematically, a token is an element t {\displaystyle t} in a finite (or countably infinite) set T {\displaystyle T} . Suppose we only need to process a finite corpus, then we can put all tokens appearing in the corpus into T {\displaystyle T} , meaning that T {\displaystyle T} is finite. However, suppose we want to process all possible words made of the English letters, then T {\displaystyle T} is countably infinite. Most neural networks can only operate on real vector inputs, so we must construct a "dictionary" function ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} . When T {\displaystyle T} is finite, of size | T | = m ≤ n {\displaystyle |T|=m\leq n} , then we can use one-hot encoding to map it into R n {\displaystyle \mathbb {R} ^{n}} . First, arbitrarily enumerate T = { t 1 , t 2 , . . , t m } {\displaystyle T=\{t_{1},t_{2},..,t_{m}\}} , then define ϕ ( t i ) = e i {\displaystyle \phi (t_{i})=e_{i}} . In other words, we assign a unique index i {\displaystyle i} to each token, then map the token with index i {\displaystyle i} to the unit basis vector e i {\displaystyle e_{i}} . One-hot encoding is easy to interpret, but it requires one to maintain the arbitrary enumeration of T {\displaystyle T} . Given a token t ∈ T {\displaystyle t\in T} , to compute ϕ ( t ) {\displaystyle \phi (t)} , we must find out the index i {\displaystyle i} of the token t {\displaystyle t} . Thus, to implement ϕ {\displaystyle \phi } efficiently, we need a fast-to-compute bijection h : T → { 1 , . . . , m } {\displaystyle h:T\to \{1,...,m\}} , then we have ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In fact, we can relax the requirement slightly: It suffices to have a fast-to-compute injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} , then use ϕ ( t ) = e h ( t ) {\displaystyle \phi (t)=e_{h(t)}} . In practice, there is no simple way to construct an efficient injection h : T → { 1 , . . . , n } {\displaystyle h:T\to \{1,...,n\}} . However, we do not need a strict injection, but only an approximate injection. That is, when t ≠ t ′ {\displaystyle t\neq t'} , we should probably have h ( t ) ≠ h ( t ′ ) {\displaystyle h(t)\neq h(t')} , so that probably ϕ ( t ) ≠ ϕ ( t ′ ) {\displaystyle \phi (t)\neq \phi (t')} . At this point, we have just specified that h {\displaystyle h} should be a hashing function. Thus we reach the idea of feature hashing. == Algorithms == === Feature hashing (Weinberger et al. 2009) === The basic feature hashing algorithm presented in (Weinberger et al. 2009) is defined as follows. First, one specifies two hash functions: the kernel hash h : T → { 1 , 2 , . . . , n } {\displaystyle h:T\to \{1,2,...,n\}} , and the sign hash ζ : T → { − 1 , + 1 } {\displaystyle \zeta :T\to \{-1,+1\}} . Next, one defines the feature hashing function: ϕ : T → R n , ϕ ( t ) = ζ ( t ) e h ( t ) {\displaystyle \phi :T\to \mathbb {R} ^{n},\quad \phi (t)=\zeta (t)e_{h(t)}} Finally, extend this feature hashing function to strings of tokens by ϕ : T ∗ → R n , ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ϕ ( t j ) {\displaystyle \phi :T^{}\to \mathbb {R} ^{n},\quad \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\phi (t_{j})} where T ∗ {\displaystyle T^{}} is the set of all finite strings consisting of tokens in T {\displaystyle T} . Equivalently, ϕ ( t 1 , . . . , t k ) = ∑ j = 1 k ζ ( t j ) e h ( t j ) = ∑ i = 1 n ( ∑ j : h ( t j ) = i ζ ( t j ) ) e i {\displaystyle \phi (t_{1},...,t_{k})=\sum _{j=1}^{k}\zeta (t_{j})e_{h(t_{j})}=\sum _{i=1}^{n}\left(\sum _{j:h(t_{j})=i}\zeta (t_{j})\right)e_{i}} ==== Geometric properties ==== We want to say something about the geometric property of ϕ {\displaystyle \phi } , but T {\displaystyle T} , by itself, is just a set of tokens, we cannot impose a geometric structure on it except the discrete topology, which is generated by the discrete metric. To make it nicer, we lift it to T → R T {\displaystyle T\to \mathbb {R} ^{T}} , and lift ϕ {\displaystyle \phi } from ϕ : T → R n {\displaystyle \phi :T\to \mathbb {R} ^{n}} to ϕ : R T → R n {\displaystyle \phi :\mathbb {R} ^{T}\to \mathbb {R} ^{n}} by linear extension: ϕ ( ( x t ) t ∈ T ) = ∑ t ∈ T x t ζ ( t ) e h ( t ) = ∑ i = 1 n ( ∑ t : h ( t ) = i x t ζ ( t ) ) e i {\displaystyle \phi ((x_{t})_{t\in T})=\sum _{t\in T}x_{t}\zeta (t)e_{h(t)}=\sum _{i=1}^{n}\left(\sum _{t:h(t)=i}x_{t}\zeta (t)\right)e_{i}} There is an infinite sum there, which must be handled at once. There are essentially only two ways to handle infinities. One may impose a metric, then take its completion, to allow well-behaved infinite sums, or one may demand that nothing is actually infinite, only potentially so. Here, we go for the potential-infinity way, by restricting R T {\displaystyle \mathbb {R} ^{T}} to contain only vectors with finite support: ∀ ( x t ) t ∈ T ∈ R T {\displaystyle \forall (x_{t})_{t\in T}\in \mathbb {R} ^{T}} , only finitely many entries of ( x t ) t ∈ T {\displaystyle (x_{t})_{t\in T}} are nonzero. Define an inner product on R T {\displaystyle \mathbb {R} ^{T}} in the obvious way: ⟨ e t , e t ′ ⟩ = { 1 , if t = t ′ , 0 , else. ⟨ x , x ′ ⟩ = ∑ t , t ′ ∈ T x t x t ′ ⟨ e t , e t ′ ⟩ {\displaystyle \langle e_{t},e_{t'}\rangle ={\begin{cases}1,{\text{ if }}t=t',\\0,{\text{ else.}}\end{cases}}\quad \langle x,x'\rangle =\sum _{t,t'\in T}x_{t}x_{t'}\langle e_{t},e_{t'}\rangle } As a side note, if T {\displaystyle T} is infinite, then the inner product space R T {\displaystyle \mathbb {R} ^{T}} is not complete. Taking its completion would get us to a Hilbert space, which allows well-behaved infinite sums. Now we have an inner product space, with enough structure to describe the geometry of the feature hashing function ϕ : R T → R n {\displaystyle \phi :\ma
Conservative morphological anti-aliasing
Conservative morphological anti-aliasing (CMAA) is an antialiasing technique originally developed by Filip Strugar at Intel. CMAA is an image-based, post processing technique similar to that of morphological antialiasing. CMAA uses 4 main steps which are image analysis for color discontinuities, locally dominant edge detection, simple shape handling, and lastly symmetrical long edge shape handling. A couple of years after CMAA was introduced, Intel unveiled an updated version which they named CMAA2.
Raseef22
Raseef22 (Arabic: رصيف22) is a liberal Arabic media network founded in 2013 based in Beirut, Lebanon. It publishes content in Arabic and English from different Arab states and describes itself as an independent media platform. International Media Support mentions Raseef22 along with HuffPost Arabic and Al Jazeera as one of the biggest Pan-Arab online platforms. == Name == The Arabic word raseef (رَصِيف) means platform or pavement, and the number 22 refers to the number of states in the Arab League. == History == Kareem Sakka co-founded Raseef22 in the aftermath of the Arab Spring, which he cites as a source of inspiration. In an article in The Washington Post, he wrote that Raseef22 was created as a "digital space for those eager to know what was going on around them." Raseef22 was one of the 500 websites censored in Egypt in late 2017 after it published an article on Egyptian security agencies' vies to influence the media. After the site was blocked in Egypt, it was targeted in a cyber attack that took it offline in locations around the world. Jamal Khashoggi wrote for Raseef22 regularly. One of his notable articles was "Notes on the Freedom of the Arabs from Oslo, Norway," published June 5, 2018. The site was blocked in Saudi Arabia December 2018 when the Saudi Ministry of Communications and Information Technology ordered its censorship due to its "unprecedented response to the assassination of Jamal Khashoggi in Istanbul." This decision might have also been related to Raseef22's coverage of Saudi-Israeli relations and interviews with activists later imprisoned or placed under house arrest coverage In 2019 the Association of LGBT Journalists (AJL) in Paris gave Raseef22 a golden foreign press award for its six-month series of articles on gender and sexuality issues. == Readership == According to its publisher in 2019, the news agency counted 12 million readers annually from 22 Arab nations. Of the readership, he wrote that it "believes in the talent and promise of the Arab mind and sees the ugliness of tyranny, patriarchy, misogyny and the futility of proxy rulers and wars." Al-Quds Al-Arabi described Raseef22 as "oriented to the youth."
Elonis v. United States
Elonis v. United States, 575 U.S. 723 (2015), was a United States Supreme Court case concerning whether conviction of threatening another person over interstate lines (under 18 U.S.C. § 875(c)) requires proof of subjective intent to threaten or whether it is enough to show that a "reasonable person" would regard the statement as threatening. In controversy were the purported threats of violent rap lyrics written by Anthony Douglas Elonis and posted to Facebook under a pseudonym. The ACLU filed an amicus brief in support of the petitioner. It was the first time the Court has heard a case considering true threats and the limits of speech on social media. == Background == In May 2010, Elonis was in the process of divorce and made a number of public Facebook posts. Prior to his postings, he had lost his job at an amusement park. He "posted the script of a sketch" by The Whitest Kids U' Know, which originally referenced saying "I want to kill the President of the United States" and replaced the president with his wife: Elonis ended the post with this statement: "Art is about pushing limits. I'm willing to go to jail for my constitutional rights. Are you?" A week later, Elonis posted about local law enforcement and a kindergarten class, which caught the attention of the Federal Bureau of Investigation. Then, he wrote a post on Facebook about one of the agents who visited him: He concluded: == Arrest and Conviction == These actions led to Elonis's arrest on December 8, 2010. He was indicted by a grand jury on five counts of threats to his estranged ex-wife, park employees and visitors, local law enforcement, an FBI agent, and a kindergarten class that had been relayed through interstate communication. At the district court, Elonis moved to dismiss the indictment for failing to allege that he had intended to threaten anyone, claiming his Facebook post was not were not intended as a threat. He argued that, as an aspiring rap artist, his posts were intended to be a form of artistic expression to help him cope with his recent loses. According to him, he did not mean anything said in his posts in a literal sense. His motion was denied. He requested a jury instruction that "the government must prove that he intended to communicate a true threat", which was also denied. He was convicted on the last four of the five counts, and was sentenced to 44 months in prison and three years on supervised release. He appealed unsuccessfully to the Third Circuit, renewing his challenge to the jury instructions. He then appealed to the U.S. Supreme Court based on lack of any attempt to show intent to threaten and on First Amendment rights. == Decision == On June 1, 2015, the U.S. Supreme Court reversed Elonis's conviction in an 8–1 decision. Chief Justice John Roberts wrote for a seven-justice majority, Samuel Alito authored an opinion concurring in part and dissenting in part, and Clarence Thomas authored a dissenting opinion. The finding of the circuit court was reversed and the matter remanded. === Majority opinion === The majority opinion, written by Roberts, did not rule on First Amendment matters or on the question of whether recklessness was sufficient mens rea to show intent. It ruled that mens rea was required to prove the commission of a crime under §875(c). Importantly, the mens rea issue had been preserved for review, since Elonis had raised that objection at every stage of the previous proceedings. The government contended that the presence of the words "intent to extort" in §875(b) and §875(d) implied that the absence in §875(c) was constructive. The court disagreed, holding that the absence of the language in §875(c) was because the section was intended to have a broader scope than threats relating to extortion. The opinion drew on many Supreme Court cases holding that in criminal law, mens rea was required though it had not been mentioned explicitly in statute. Consequently, the Supreme Court ruled in favor of Elonis. === Alito's concurrence === Justice Samuel Alito, concurring in part and dissenting in part, opined that while agreeing that mens rea was required and specifically that showing negligence was not sufficient, the court should have ruled on the question of recklessness. He further opined that recklessness was sufficient to show a crime under that provision on the basis that going further would amount to amending the statute, rather than interpreting it. Since Elonis explicitly argued that recklessness was not sufficient, Alito said: I would therefore remand for the Third Circuit to determine if Elonis’s failure (indeed, refusal) to argue for recklessness prevents reversal of his conviction. The Third Circuit should also have the opportunity to consider whether the conviction could be upheld on harmless error grounds. Alito also addressed the First Amendment question, elided by the majority opinion. He held that "lyrics in songs that are performed for an audience or sold in recorded form are unlikely to be interpreted as a real threat to a real person. ... Statements on social media that are pointedly directed at their victims, by contrast, are much more likely to be taken seriously." === Thomas's dissent === Justice Clarence Thomas, dissenting, wrote against discarding the "general intent" standard without replacing it with a clearer standard. Thomas argued that "there is no historical practice requiring more than general intent when a statute regulates speech." Thomas cited Rosen v. United States, arguing that general intent was sufficient in this case. However, the majority opinion offers refutation in that Rosen turned on ignorance of the law: knowledge as to whether material was legally obscene, not on whether it was intended to be obscene. Thomas also supported the government's claim that the presence of "intent to extort" language in the adjacent §875(b) and did not address the majority's reasoning on that language. Thomas used precedent, notably from the states and 18th-century England based on other but similar and, arguably, influencing legislation to support his "general intent" claim. Thomas also drew a parallel with general intent in tort. While he sought to address the First Amendment issues, he never strayed far from "general intent". == Aftermath == On remand, the Third Circuit reaffirmed the conviction "concluding beyond a reasonable doubt that Elonis would have been convicted if the jury had been properly instructed" and therefore was harmless error. In 2022, Elonis was once again arrested and indicted on three counts of cyberstalking involving three people. It was discovered that between 2018 and 2021, Elonis had sent numerous threatening messages over email, text, voice mail, and social media platforms like Twitter to a former prosecutor of the Eastern District of Pennsylvania, his ex-girlfriend, and ex-wife. On August 5, after a five-day trial, Elonis was found guilty on all three counts, and on March 23, 2023, he was sentenced by U.S. District Court Judge Edward G. Smith of Easton, Pennsylvania to twelve years and seven months in prison.
Ajax (programming)
The Asynchronous JavaScript and XML, usually referred to as Ajax (or AJAX, ) is a set of web development techniques that uses various web technologies on the client-side to create asynchronous web applications. With Ajax, web applications can send and retrieve data from a server asynchronously (in the background) without interfering with the display and behaviour of the existing page. By decoupling the data interchange layer from the presentation layer, Ajax allows web pages and, by extension, web applications, to change content dynamically without the need to reload the entire page. In practice, modern implementations commonly utilize JSON instead of XML. Ajax is not a technology, but rather a programming pattern. HTML and CSS can be used in combination to mark up and style information. The webpage can be modified by JavaScript to dynamically display (and allow the user to interact with) the new information. The built-in XMLHttpRequest object is used to execute Ajax on webpages, allowing websites to load content onto the screen without refreshing the page. == History == In the early-to-mid 1990s, most Websites were based on complete HTML pages. Each user action required a complete new page to be loaded from the server. This process was inefficient, as reflected by the user experience: all page content disappeared, then the new page appeared. Each time the browser reloaded a page because of a partial change, all the content had to be re-sent, even though only some of the information had changed. This placed additional load on the server and made bandwidth a limiting factor in performance. The foundations of AJAX originate back in 1996 with the introduction of JavaScript 1. Developers quickly discovered that any HTML element which accepted a "src" attribute could be used to fetch remote data. By changing the src of a hidden frame, a developer could fetch remote data, process or display it without a page refresh. The remote data could be a string, JavaScript code, XML or a partial HTML page generated on the server. The same could be done with and
Arabic Ontology
Arabic Ontology is a website offering linguistic ontology services for the Arabic language which can be used like the online site WordNet. Users can use Arabic Ontology to classify or clarify the concepts and meanings of Arabic terms. == Ontology Structure == The ontology structure (i.e., data model) is similar to WordNet's structure. Each concept in the database is given a unique concept identifier (URI), informally described by a gloss, and lexicalized by one or more synonymous lemma terms. Each term-concept pair is called a sense, and is given a SenseID. A set of senses is called synset. Concepts and senses are described by further attributes such as era and area — to specify example usage and ontological analysis. Semantic relations are defined between concepts. Some important entities are included in the ontology, such as individual countries and bodies of water. These individuals are given separate IndividualIDs and linked with their concepts through the InstanceOf relation. == Mappings to other resources == Concepts in the Arabic Ontology are mapped to synsets in WordNet, as well as to BFO and DOLCE. Terms used in the Arabic Ontology are mapped to lemmas in the LDC's SAMA database. == Applications == Arabic Ontology can be used in many application domains, such as: Information retrieval, to enrich queries (e.g., in search engines) and improve the quality of the results, i.e. meaningful search rather than string-matching search; Machine translation and word-sense disambiguation, by finding the exact mapping of concepts across languages, especially that the Arabic ontology is also mapped to the WordNet; Data Integration and interoperability in which the Arabic ontology can be used as a semantic reference to link databases and information systems; Semantic Web and Web 3.0, by using the Arabic ontology as a semantic reference to disambiguate the meanings used in websites; among many other applications. == URLs Design == The URLs in the Arabic Ontology are designed according to the W3C's Best Practices for Publishing Linked Data, as described in the following URL schemes. This allows one to also explore the whole database like exploring a graph: Ontology Concept: Each concept in the Arabic Ontology has a ConceptID and can be accessed using: https://{domain}/concept/{ConceptID | Term}. In case of a term, the set of concepts that this term lexicalizes are all retrieved. In case of a ConceptID, the concept and its direct subtypes are retrieved, e.g. https://ontology.birzeit.edu/concept/293198 Semantic relations: Relationships between concepts can be accessed using these schemes: (i) the URL: https:// {domain}/concept/{RelationName}/{ConceptID} allows retrieval of relationships among ontology concepts. (ii) the URL: https://{domain}/lexicalconcept/{RelationName}/{lexicalConceptID} allows retrieval of relations between lexical concepts. For example, https://ontology.birzeit.edu/concept/instances/293121 retrieves the instances of the concept 293121. The relations that are currently used in our database are: {subtypes, type, instances, parts, related, similar, equivalent}.
Interference (communication)
In telecommunications, an interference is that which modifies a signal in a disruptive manner, as it travels along a communication channel between its source and receiver. The term is often used to refer to the addition of unwanted signals to a useful signal. Common examples include: Electromagnetic interference (EMI) Co-channel interference (CCI), also known as crosstalk Adjacent-channel interference (ACI) Intersymbol interference (ISI) Inter-carrier interference (ICI), caused by doppler shift in OFDM modulation (multitone modulation). Common-mode interference (CMI) Conducted interference Noise is a form of interference but not all interference is noise. Radio resource management aims at reducing and controlling the co-channel and adjacent-channel interference. == Interference alignment == A solution to interference problems in wireless communication networks is interference alignment, which was crystallized by Syed Ali Jafar at the University of California, Irvine. A specialized application was previously studied by Yitzhak Birk and Tomer Kol for an index coding problem in 1998. For interference management in wireless communication, interference alignment was originally introduced by Mohammad Ali Maddah-Ali, Abolfazl S. Motahari, and Amir Keyvan Khandani, at the University of Waterloo, for communication over wireless X channels. Interference alignment was eventually established as a general principle by Jafar and Viveck R. Cadambe in 2008, when they introduced "a mechanism to align an arbitrarily large number of interferers, leading to the surprising conclusion that wireless networks are not essentially interference limited." This led to the adoption of interference alignment in the design of wireless networks. Jafar explained: My research group crystallized the concept of interference alignment and showed that through interference alignment, it is possible for everyone to access half of the total bandwidth free from interference. Initially this result was shown under a number of idealized assumptions that are typical in theoretical studies. We have since continued to work on peeling off these idealizations one at a time, to bring the theory closer to practice. Along the way we have made numerous discoveries through the lens of interference alignment, which reveal new and powerful signaling schemes. According to New York University senior researcher Paul Horn: Syed Jafar revolutionized our understanding of the capacity limits of wireless networks. He demonstrated the astounding result that each user in a wireless network can access half of the spectrum without interference from other users, regardless of how many users are sharing the spectrum. This is a truly remarkable result that has a tremendous impact on both information theory and the design of wireless networks.