AI Coding Kya Hota Hai

AI Coding Kya Hota Hai — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Qapital

    Qapital

    Qapital is a personal finance mobile application (app) for the iOS and Android operating systems, developed by Qapital, LLC. The app is designed to motivate users to save money through a gamification of their spending behavior. It moves money from a user's checking account to a separate Qapital account, when certain rules are triggered. Its database is used by psychology professor Dan Ariely to study consumer behavior. Qapital was released in Sweden in 2013, then in the US in early 2015. The application was later withdrawn from the Swedish market in April 2015, in order to focus on the US market. == History == The idea for Qapital was conceived by ex-bankers in Sweden. The software was designed by twin brothers Daniel and Andreas Källbom of Studio Källbom and released in Sweden in December 2013. The original software was a personal finance dashboard, similar to Mint.com, to show its users how they spent their money. Qapital introduced the app into the US market with a different design in 2014 and started focusing exclusively on the US market. The app was re-designed to focus on building savings rather than managing personal finances. The Swedish version shut down in April 2015. The app was initially restricted to the iOS platform, but an Android version was released at the end of 2015. Shortly after its US launch, Qapital invited psychology professor Dan Ariely to join its team as its "chief behavioral economist". He uses the app's database to conduct research into behavioral economics and Qapital in turn uses Ariely's research in design and programming decisions. In 2017, Qapital added checking and debit card services to the app. == Concept and features == Qapital is a free personal finance app for iOS and Android devices, intended to encourage its users to save money. Qapital directs each of its users to set savings goals, then automatically transfers money from their checking account to an account for savings, when a rule established in the app is met. It uses the "if this then that" (IFTTT) rule-based web-service. For example, one rule could be that if a user purchases a cup of coffee, then the app will round up the charge to the nearest dollar and deposit the difference into savings. Users connect their bank accounts to Qapital, so it knows when purchases are made. When a rule is met, money for savings are transferred to a Qapital account operated in partnership with Lincoln Savings Bank. As of 2015, Qapital can connect to more than 180 other apps, such as Facebook, Twitter, Dropbox and Instagram. For example, connecting to Jawbone allows the user to set a rule that if they take a certain number of steps during the day, a set amount of money is transferred to savings. The app also allows users to monitor activity among their other financial accounts, such as deposits and withdrawals. == Reception == In an October 2015 review, PC Magazine gave Qapital four out of five marks and an editor rating of "excellent." The review praised the app for having a "lovely design" and criticized it for being a, "bit simplistic in some of its rules." Bankrate, in a May 2015 review, gave the app a score of 3/5 for "ease of use," 5/5 for "features," 4/5 for "effectiveness," 4/5 for "value," for a total score of 16/20. The reviewer criticized Qapital's savings account for providing a low-interest rate, but concluded that its numerous features make the app "intriguing" and "it would be difficult to find a standard bank app more fun to use than Qapital."

    Read more →
  • Bus encryption

    Bus encryption

    Bus encryption is the use of encrypted program instructions on a data bus in a computer that includes a secure cryptoprocessor for executing the encrypted instructions. Bus encryption is used primarily in electronic systems that require high security, such as automated teller machines, TV set-top boxes, and secure data communication devices such as two-way digital radios. Bus encryption can also mean encrypted data transmission on a data bus from one processor to another processor. For example, from the CPU to a GPU which does not require input of encrypted instructions. Such bus encryption is used by Windows Vista and newer Microsoft operating systems to protect certificates, BIOS, passwords, and program authenticity. PVP-UAB (Protected Video Path) provides bus encryption of premium video content in PCs as it passes over the PCIe bus to graphics cards to enforce digital rights management. The need for bus encryption arises when multiple people have access to the internal circuitry of an electronic system, either because they service and repair such systems, stock spare components for the systems, own the system, steal the system, or find a lost or abandoned system. Bus encryption is necessary not only to prevent tampering of encrypted instructions that may be easily discovered on a data bus or during data transmission, but also to prevent discovery of decrypted instructions that may reveal security weaknesses that an intruder can exploit. In TV set-top boxes, it is necessary to download program instructions periodically to customer's units to provide new features and to fix bugs. These new instructions are encrypted before transmission, but must also remain secure on data buses and during execution to prevent the manufacture of unauthorized cable TV boxes. This can be accomplished by secure crypto-processors that read encrypted instructions on the data bus from external data memory, decrypt the instructions in the cryptoprocessor, and execute the instructions in the same cryptoprocessor.

    Read more →
  • NATGRID

    NATGRID

    The National Intelligence Grid or NATGRID is an integrated intelligence master database structure for counter-terrorism purposes which connects databases of various core security agencies under the Government of India. It collects and analyses comprehensive patterns procured from 21 different organizations that can be readily accessed by security agencies round the clock. As of September 2025 its CEO is Hirdesh Kumar. NATGRID came into existence after the 2008 Mumbai attacks. The Government of India in July 2016 appointed Ashok Patnaik as the Chief Executive Officer (CEO) of NATGRID. The appointment is being seen as the government's effort to revive the project. Patnaik's appointment was valid till 31 December 2018. As of 2019, NATGRID is headed by an Indian Police Service (IPS) officer Ashish Gupta. The Ministry of Home Affairs on 5 February 2020 announced in Parliament that Project NATGRID with all its required physical infrastructures been completed as of 31 March 2020 and the NATGRID solution went live as of 31 December 2020. == Reason for establishment == The landscape of Terrorism in India and the subsequent response by Law enforcement in India have necessitated a sophisticated data-integration framework, positioning NATGRID as a vital tool for national security agencies. This shift towards Mass surveillance in India is rooted in a broader policy evolution of state monitoring, which is technologically enabled by the India Stack—the foundational digital infrastructure providing the API-based backbone for government service delivery and identity verification. This ecosystem is further bolstered by advanced Signal intelligence capabilities and the implementation of SIM binding, a security protocol that anchors a user’s digital identity to a specific mobile device and verified SIM card to prevent identity fraud and unauthorized access. Collectively, these elements form a 360-degree surveillance and authentication grid designed to preemptively identify threats by synthesizing historical, financial, and real-time communication data across disparate platforms. === Terror attacks in India === The 2008 Mumbai attacks led to the exposure of several weaknesses in India's intelligence gathering and action networks. NATGRID is part of the radical overhaul of the security and intelligence apparatuses of India that was mooted by the then Home Minister P. Chidambaram in 2009. The National Investigation Agency (NIA) and the National Counter Terrorism Centre (NCTC) are two organisations established in the aftermath of the Mumbai attacks of 2008. Before the Mumbai attacks, a Pakistani origin American Lashkar-e-Taiba (LeT) operative David Coleman Headley had visited India several times and done a recce of the places that came under attack on 26/11. Despite having travelled to India several times and having returned to the US through Pakistan or West Asia, his trips failed to raise the suspicion of Indian agencies as they lacked a system that could reveal a pattern in his unusual travel itineraries and trips to the country. It was argued that if they had a system like the NATGRID in place, Headley would have been apprehended well before the attacks. === Need for the integrated intelligence system === During the inauguration of NATGRID campus in Bengaluru, the Minister of Home Affairs, Amit Shah stated that a new national database is in the process of being made which will bring a change in the current ways of functioning of agencies once it's ready also adding that the government has entrusted the task of developing and operating a state-of-the-art and innovative technology system. It is accessible to 11 central agencies in the first phase and in later phases will be made accessible to police of all States and Union Territories and only authorized personnel are allowed access to the platform on a case-to-case basis for investigations into suspected cases of terrorism. NATGRID has a total fund allocation of ₹3,400 crore (US$355 million). d == Legal framework == Relevant legal framework: Digital Personal Data Protection Act, 2023 – The legislative framework governing how digital data is handled. Information Technology Act - Interception Rules, 2002 – The specific regulations under the Information Technology Act that govern these agencies. National Security Act of 1980, evidence-based preventative detention of suspects Right to Information Act, 2005, for obtaining information from the government and used by activists and whistleblowers == Structure and functions == === Multi-agency integrated intelligence database === NATGRID is an intelligence sharing network that collates data from the standalone databases of the various agencies and ministries of the Indian government. It is a counter terrorism measure that collects and collates a host of information from government databases including tax and bank account details, credit/debit card transactions, visa and immigration records and itineraries of rail and air travel. It also has access to the Crime and Criminal Tracking Network and Systems, a database that links crime information, including First Information Reports, across 14,000 police stations in India. This combined data will be made available to 11 central agencies, which are: the Research and Analysis Wing (R&AW), Intelligence Bureau (IB), National Investigation Agency (NIA), Central Bureau of Investigation (CBI), Narcotics Control Bureau (NCB), Financial Intelligence Unit (India) (FIU), Enforcement Directorate (ED), Central Board of Direct Taxes (CBDT), Central Board of Indirect Taxes and Customs (CBIC), Directorate of Revenue Intelligence (DRI) and Directorate General of GST Intelligence. Also as stated by the MHA, NATGRID will have an in-built mechanism for continuous upgradation. In the later phases of NATGRID integration, the central government further plans to integrate 950 additional organizations into it. === Key components and users === ==== Some important backend data feeds to the NATGRID (middleware) ==== National Crime Records Bureau's Crime and Criminal Tracking Network and Systems (CCTNS) national-integrated law-and-order database for the state-level police forces: CCTNS is a mission-mode project under the National e-Governance Plan that interconnects over 15,000 police stations across India. It serves as the primary source for NATGRID to access digitized FIR (First Information Report) data and criminal history records from state-level law enforcement. NSA's National Technical Research Organisation (NTRO) national security-based database feed to NATGRID: NTRO serves as a primary technical data provider to NATGRID, offering specialized intercepts and satellite imagery. While NATGRID functions as a centralized data-integration middleware under the Ministry of Home Affairs, NTRO reports to the National Security Advisor within the Prime Minister's Office. DRDO's NETRA (Network Traffic Analysis) ELINT-based mass surveillance system for monitor internal internet traffic for keywords related to terrorism and criminal activity within Indian borders: Developed by the Centre for Artificial Intelligence and Robotics (CAIR), NETRA is an internet monitoring system capable of scanning traffic for specific trigger words. It provides digital behavioral triggers that NATGRID can cross-reference against structural data like financial or travel records. NETRA is a massive software network used to intercept and analyze internet traffic (emails, social media, blogs) for keywords like "bomb," "attack," or "kill." The intelligence gathered by NETRA regarding suspicious digital patterns or "keyword hits" can be fed into NATGRID. This allows an investigator to see if a person flagged by NETRA also has suspicious travel (from airline databases) or financial records (from bank databases) linked within NATGRID. Department of Telecommunications (DoT's Central Monitoring System (CMS) for lawfully intercepting national and international telecomm data: CMS is the centralized system for lawful interception of all telecommunications (phone calls, SMS, and data) in India, managed by the Department of Telecommunications (DoT). While CMS focuses on the content and metadata of real-time communication, NATGRID focuses on historical/structural data (tax, travel, identity). They represent two halves of a 360-degree surveillance profile: CMS listens to what a suspect says, while NATGRID tracks where they go and what they own. The CMS allows for the lawful interception of telecommunications metadata and content in real-time. In the broader surveillance architecture, CMS provides the "active" communication profile while NATGRID provides the "static" historical profile. Telecom Enforcement Resource and Monitoring (TERM) - Telecomm Regulatory & Verification Node for telecomm KYC: TERM cells verify subscriber identity (KYC) and maintain the integrity of telecom databases. NATGRID relies on these audited records to ensure the accuracy of telephone-to-identity mapping. TERM

    Read more →
  • Cambridge Semantics

    Cambridge Semantics

    Cambridge Semantics is a privately held company headquartered in Boston, Massachusetts with an office in San Diego, California. The company is an enterprise big data management and exploratory analytics software company. == History == Cambridge Semantics was founded in 2007 by Sean Martin, Lee Feigenbaum, Simon Martin, Rouben Meschian, Ben Szekely and Emmett Eldred who all previously worked at IBM's Advanced Technology Internet Group. In 2012, Cambridge Semantics appointed Chuck Pieper as chief executive. Pieper was previously at Credit Suisse. In January 2016, Cambridge Semantics acquired SPARQL City and its graph database intellectual property. On April 18, 2024, Altair Engineering acquired Cambridge Semantics. On 26 March 2025, Siemens announced the acquisition of Altair. == Products == Anzo Smart Data Lake uses Semantic Web Technologies. It allows IT departments and their business users to access data. AnzoGraph DB Graph database. AnzoGraph DB is a massively parallel processing (MPP) native graph database built for diverse data harmonization and analytics at scale (trillions of triples and more), speed and deep link insights. It is used for embedded analytics that require graph algorithms, graph views, named queries, aggregates, geospatial, built-in data science functions, data warehouse-style BI and reporting functions. It allows users to load and query RDF data using SPARQL or Cypher for OLAP-style analytics. == Marketing == Cambridge Semantics named SIIA Codie award 2018 finalist. Cambridge Semantics named 2018 Gold Stevie Award Winner for 'Big Data Solutions'. Cambridge Semantics named KMWorld’s 2018 ‘100 Companies That Matter in Knowledge Management’. Cambridge Semantics named to Database Trends and Applications' 'Trend-Setting Products in Data and Information Management for 2018'. Cambridge Semantics named to KMWorld Trend-Setting Products of 2017. Cambridge Semantics named to Database Trends and Applications 'DBTA 100: The Companies That Matter Most in Data'. Cambridge Semantics named SIIA Codie award 2017 winner for ‘Best Text Analytics and Semantic Technology Solution’. Cambridge Semantics named 2017 Silver Stevie Award Winner for 'Big Data Solutions'. Cambridge Semantics named KMWorld’s 2017 ‘100 Companies That Matter in Knowledge Management’. Cambridge Semantics named SIIA Codie award 2016 finalist. Cambridge Semantics named KMWorld’s 2016 ‘100 Companies That Matter in Knowledge Management’ and KMWorld Trend-Setting Products of 2015. Cambridge Semantics named 2016 Bio-IT World Best of Show People's Choice Award Contenders and 2015 Bio-IT best of show finalist. Anzo Insider Trading Investigation and Surveillance named 2015 CODiE Award finalist. Cambridge Semantics Selected as Finalist for 2014 MIT Sloan CIO Symposium's Innovation Showcase. Cambridge Semantics named SIIA CODiE Award 2014 finalist. Cambridge Semantics Win 2013 SIIA CODiE Award for best business intelligence and analytics solution. Cambridge Semantics wins KMWorld 2012 Promise Award. Cambridge Semantics wins Best of Show at 2012 Bio-IT World Conference.

    Read more →
  • Lexical Markup Framework

    Lexical Markup Framework

    Language resource management – Lexical markup framework (LMF; ISO 24613), produced by ISO/TC 37, is the ISO standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles and methods relating to language resources in the contexts of multilingual communication. == Objectives == The goals of LMF are to provide a common model for the creation and use of lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of large number of individual electronic resources to form extensive global electronic resources. Types of individual instantiations of LMF can include monolingual, bilingual or multilingual lexical resources. The same specifications are to be used for both small and large lexicons, for both simple and complex lexicons, for both written and spoken lexical representations. The descriptions range from morphology, syntax, computational semantics to computer-assisted translation. The covered languages are not restricted to European languages but cover all natural languages. The range of targeted NLP applications is not restricted. LMF is able to represent most lexicons, including WordNet, EDR and PAROLE lexicons. == History == In the past, lexicon standardization has been studied and developed by a series of projects like GENELEX, EDR, EAGLES, MULTEXT, PAROLE, SIMPLE and ISLE. Then, the ISO/TC 37 National delegations decided to address standards dedicated to NLP and lexicon representation. The work on LMF started in Summer 2003 by a new work item proposal issued by the US delegation. In Fall 2003, the French delegation issued a technical proposition for a data model dedicated to NLP lexicons. In early 2004, the ISO/TC 37 committee decided to form a common ISO project with Nicoletta Calzolari (CNR-ILC Italy) as convenor and Gil Francopoulo (Tagmatica France) and Monte George (ANSI, United States) as editors. The first step in developing LMF was to design an overall framework based on the general features of existing lexicons and to develop a consistent terminology to describe the components of those lexicons. The next step was the actual design of a comprehensive model that best represented all of the lexicons in detail. A large panel of 60 experts contributed a wide range of requirements for LMF that covered many types of NLP lexicons. The editors of LMF worked closely with the panel of experts to identify the best solutions and reach a consensus on the design of LMF. Special attention was paid to the morphology in order to provide powerful mechanisms for handling problems in several languages that were known as difficult to handle. 13 versions have been written, dispatched (to the National nominated experts), commented and discussed during various ISO technical meetings. After five years of work, including numerous face-to-face meetings and e-mail exchanges, the editors arrived at a coherent UML model. In conclusion, LMF should be considered a synthesis of the state of the art in NLP lexicon field. == Current stage == The ISO number is 24613. The LMF specification has been published officially as an International Standard on 17 November 2008. == As one of the members of the ISO/TC 37 family of standards == The ISO/TC 37 standards are currently elaborated as high level specifications and deal with word segmentation (ISO 24614), annotations (ISO 24611 a.k.a. MAF, ISO 24612 a.k.a. LAF, ISO 24615 a.k.a. SynAF, and ISO 24617-1 a.k.a. SemAF/Time), feature structures (ISO 24610), multimedia containers (ISO 24616 a.k.a. MLIF), and lexicons (ISO 24613). These standards are based on low level specifications dedicated to constants, namely data categories (revision of ISO 12620), language codes (ISO 639), scripts codes (ISO 15924), country codes (ISO 3166) and Unicode (ISO 10646). The two level organization forms a coherent family of standards with the following common and simple rules: the high level specification provides structural elements that are adorned by the standardized constants; the low level specifications provide standardized constants as metadata. == Key standards == The linguistics constants like /feminine/ or /transitive/ are not defined within LMF but are recorded in the Data Category Registry (DCR) that is maintained as a global resource by ISO/TC 37 in compliance with ISO/IEC 11179-3:2003. And these constants are used to adorn the high level structural elements. The LMF specification complies with the modeling principles of Unified Modeling Language (UML) as defined by Object Management Group (OMG). The structure is specified by means of UML class diagrams. The examples are presented by means of UML instance (or object) diagrams. An XML DTD is given in an annex of the LMF document. == Model structure == LMF is composed of the following components: The core package that is the structural skeleton which describes the basic hierarchy of information in a lexical entry. Extensions of the core package which are expressed in a framework that describes the reuse of the core components in conjunction with the additional components required for a specific lexical resource. The extensions are specifically dedicated to morphology, MRD, NLP syntax, NLP semantics, NLP multilingual notations, NLP morphological patterns, multiword expression patterns, and constraint expression patterns. == Example == In the following example, the lexical entry is associated with a lemma clergyman and two inflected forms clergyman and clergymen. The language coding is set for the whole lexical resource. The language value is set for the whole lexicon as shown in the following UML instance diagram. The elements Lexical Resource, Global Information, Lexicon, Lexical Entry, Lemma, and Word Form define the structure of the lexicon. They are specified within the LMF document. On the contrary, languageCoding, language, partOfSpeech, commonNoun, writtenForm, grammaticalNumber, singular, plural are data categories that are taken from the Data Category Registry. These marks adorn the structure. The values ISO 639-3, clergyman, clergymen are plain character strings. The value eng is taken from the list of languages as defined by ISO 639-3. With some additional information like dtdVersion and feat, the same data can be expressed by the following XML fragment: This example is rather simple, while LMF can represent much more complex linguistic descriptions the XML tagging is correspondingly complex. == Selected publications about LMF == The first publication about the LMF specification as it has been ratified by ISO (this paper became (in 2015) the 9th most cited paper within the Language Resources and Evaluation conferences from LREC papers): Language Resources and Evaluation LREC-2006/Genoa: Gil Francopoulo, Monte George, Nicoletta Calzolari, Monica Monachini, Nuria Bel, Mandy Pet, Claudia Soria: Lexical Markup Framework (LMF) About semantic representation: Gesellschaft für linguistische Datenverarbeitung GLDV-2007/Tübingen: Gil Francopoulo, Nuria Bel, Monte George Nicoletta Calzolari, Monica Monachini, Mandy Pet, Claudia Soria: Lexical Markup Framework ISO standard for semantic information in NLP lexicons About African languages: Traitement Automatique des langues naturelles, Marseille, 2014: Mouhamadou Khoule, Mouhamad Ndiankho Thiam, El Hadj Mamadou Nguer: Toward the establishment of a LMF-based Wolof language lexicon (Vers la mise en place d'un lexique basé sur LMF pour la langue wolof) [in French] About Asian languages: Lexicography, Journal of ASIALEX, Springer 2014: Lexical Markup Framework: Gil Francopoulo, Chu-Ren Huang: An ISO Standard for Electronic Lexicons and its Implications for Asian Languages DOI 10.1007/s40607-014-0006-z About European languages: COLING 2010: Verena Henrich, Erhard Hinrichs: Standardizing Wordnets in the ISO Standard LMF: Wordnet-LMF for GermaNet EACL 2012: Judith Eckle-Kohler, Iryna Gurevych: Subcat-LMF: Fleshing out a standardized format for subcategorization frame interoperability EACL 2012: Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M Meyer, Christian Wirth: UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF. About Semitic languages: Journal of Natural Language Engineering, Cambridge University Press (to appear in Spring 2015): Aida Khemakhem, Bilel Gargouri, Abdelmajid Ben Hamadou, Gil Francopoulo: ISO Standard Modeling of a large Arabic Dictionary. Proceedings of the seventh Global Wordnet Conference 2014: Nadia B M Karmani, Hsan Soussou, Adel M Alimi: Building a standardized Wordnet in the ISO LMF for aeb language. Proceedings of the workshop: HLT & NLP within Arabic world, LREC 2008: Noureddine Loukil, Kais Haddar, Abdelmajid Ben Hamadou: Towards a syntactic lexicon of Arabic Verbs. Traitement Automatique des Langues Naturelles, Toulouse (in French) 2007: Khemakhem A, Gargouri B, Abdelwahed A, Francopoulo G: Modélisation des paradigmes de fl

    Read more →
  • Critical data studies

    Critical data studies

    Critical data studies is the exploration of and engagement with social, cultural, and ethical challenges that arise when working with big data. It is through various unique perspectives and taking a critical approach that this form of study can be practiced. As its name implies, critical data studies draws heavily on the influence of critical theory, which has a strong focus on addressing the organization of power structures. This idea is then applied to the study of data. Interest in this unique field of critical data studies began in 2011 with scholars danah boyd and Kate Crawford posing various questions for the critical study of big data and recognizing its potential threatening impacts on society and culture. It was not until 2014, and more exploration and conversations, that critical data studies was officially coined by scholars Craig Dalton and Jim Thatcher. They put a large emphasis on understanding the context of big data in order to approach it more critically. Researchers such as David Ribes, Robert Soden, Seyram Avle, Sarah E. Fox, and Phoebe Sengers focus on understanding data as a historical artifact and taking an interdisciplinary approach towards critical data studies. Other key scholars in this discipline include Rob Kitchin and Tracey P. Lauriault who focus on reevaluating data through different spheres. Various critical frameworks that can be applied to analyze big data include Feminist, Anti-Racist, Queer, Indigenous, Decolonial, Anti-Ableist, as well as Symbolic and Synthetic data science. These frameworks help to make sense of the data by addressing power, biases, privacy, consent, and underrepresentation or misrepresentation concerns that exist in data as well as how to approach and analyze this data with a more equitable mindset. == Motivation == In their article in which they coin the term 'critical data studies,' Dalton and Thatcher also provide several justifications as to why data studies is a discipline worthy of a critical approach. First, 'big data' is an important aspect of twenty-first century society, and the analysis of 'big data' allows for a deeper understanding of what is happening and for what reasons. Big data is important to critical data studies because it is the type of data used within this field. Big data does not necessarily refer to a large data set, it can have a data set with millions of rows, but also a data set that just has a wide variety and expansive scope of data with a smaller type of dataset. As well as having whole populations in the data set and not just sample sizes. Furthermore, big data as a technological tool and the information that it yields are not neutral, according to Dalton and Thatcher, making it worthy of critical analysis in order to identify and address its biases. Building off this idea, another justification for a critical approach is that the relationship between big data and society is an important one, and therefore worthy of study. Ribes et. al. argue there is a need for an interdisciplinary understanding of data as a historical artifact as a motivating aspect of critical data studies.The overarching consensus in the Computer-Supported Cooperative Work (CSCW) field, is that people should speak for the data, and not let the data speak for itself. The sources of big data and it’s relationship to varied metadata can be a complicated one, which leads to data disorder and a need for an ethical analysis. Additionally, Iliadis and Russo (2016) have called for studying data assemblages. This is to say, data has innate technological, political, social, and economic histories that should be taken into consideration. Kitchin argues data is almost never raw, and it is almost always cooked, meaning that it is always spoken for by the data scientists utilizing it. Thus, Big Data should be open to a variety of perspectives, especially those of cultural and philosophical nature. Further, data contains hidden histories, ideologies, and philosophies. Big data technology can cause significant changes in society's structure and in the everyday lives of people, and, being a product of society, big data technology is worthy of sociological investigation. Moreover, data sets are almost never completely without any influence. Rather, data are shaped by the vision or goals of those gathering the data, and during the data collection process, certain things are quantified, stored, sorted and even discarded by the research team. A critical approach is thus necessary in order to understand and reveal the intent behind the information being presented.One of these critical approaches has been through feminist data studies. This method applies feminist principles to critical studies and data collecting and analysis. The goal of this is to address the power imbalance in data science and society. According to Catherine D’Ignazio and Lauren F. Klein, a power analysis can be performed by examining power, challenging power, evaluating emotion and embodiment, rethinking binaries and hierarchies, embracing pluralism, considering context, and making labor visible. Feminist data studies is part of the movement towards making data to benefit everyone and not to increase existing inequalities. Moreover, data alone cannot speak for themselves; in order to possess any concrete meaning, data must be accompanied by theoretical insight or alternative quantitative or qualitative research measures. Based on different social topics such as anti-racist data studies, critical data studies give a focus on those social issues concerning data. Specifically in anti-racist data studies they use a classification approach to get representation for those within that community. Desmond Upton Patton and others used their own classification system in the communities of Chicago to help target and reduce violence with young teens on twitter. They had students in those communities help them to decipher the terminology and emojis of these teens to target the language used in tweets that followed with violence outside of the computer screens. This is just one real world example of critical data studies and its application. Dalton and Thatcher argue that if one were to only think of data in terms of its exploitative power, there is no possibility of using data for revolutionary, liberatory purposes. Finally, Dalton and Thatcher propose that a critical approach in studying data allows for 'big data' to be combined with older, 'small data,' and thus create more thorough research, opening up more opportunities, questions and topics to be explored. == Issues and concerns for critical data scholars == Data plays a pivotal role in the emerging knowledge economy, driving productivity, competitiveness, efficiency, sustainability, and capital accumulation. The ethical, political, and economic dimensions of data dynamically evolve across space and time, influenced by changing regimes, technologies, and priorities. Technically, the focus lies on handling, storing, and analyzing vast data sets, utilizing machine learning-based data mining and analytics. This technological advancement raises concerns about data quality, encompassing validity, reliability, authenticity, usability, and lineage. The use of data in modern society brings about new ways of understanding and measuring the world, but also brings with it certain concerns or issues. Data scholars attempt to bring some of these issues to light in their quest to be critical of data. Technical and organizational issues could include the scope of the data set, meaning there is too little or too much data to work with, leading to inaccurate results. It becomes crucial for critical data scholars to carefully consider the adequacy of data volume for their analyses. The quality of the data itself is another facet of concern. The data itself could be of poor quality, such as an incomplete or messy data set with missing or inaccurate data values. This would lead researchers to have to make edits and assumptions about the data itself. Addressing these issues often requires scholars to make edits and assumptions about the data to ensure its reliability and relevance. Data scientists could have improper access to the actual data set, limiting their abilities to analyze it. Linnet Taylor explains how gaps in data can arise when people of varying levels of power have certain rights to their data sources. These people in power can control what data is collected, how it is displayed and how it is analyzed. The capabilities of the research team also play a crucial role in the quality of data analytics. The research team may have inadequate skills or organizational capabilities which leads to the actual analytics performed on the dataset to be biased. This can also lead to ecological fallacies, meaning an assumption is made about an individual based on data or results from a larger group of people. These technical and organizational challenges highlight the complexity of working with data and

    Read more →
  • Feistel cipher

    Feistel cipher

    In cryptography, a Feistel cipher (also known as Luby–Rackoff block cipher) is a symmetric structure used in the construction of block ciphers, named after the German-born physicist and cryptographer Horst Feistel, who did pioneering research while working for IBM; it is also commonly known as a Feistel network. A large number of block ciphers use the scheme, including the US Data Encryption Standard, the Soviet/Russian GOST (aka Magma) and the more recent Blowfish and Twofish ciphers. In a Feistel cipher, encryption and decryption are very similar operations, and both consist of iteratively running a function called a "round function" a fixed number of times. == History == Many modern symmetric block ciphers are based on Feistel networks. Feistel networks were first seen commercially in IBM's Lucifer cipher, designed by Horst Feistel and Don Coppersmith in 1973. Feistel networks gained respectability when the U.S. Federal Government adopted the DES (a cipher based on Lucifer, with changes made by the NSA) in 1976. Like other components of the DES, the iterative nature of the Feistel construction makes implementing the cryptosystem in hardware easier (particularly on the hardware available at the time of DES's design). == Design == A Feistel network uses a round function, a function which takes two inputs – a data block and a subkey – and returns one output of the same size as the data block. In each round, the round function is run on half of the data to be encrypted, and its output is XORed with the other half of the data. This is repeated a fixed number of times, and the final output is the encrypted data. An important advantage of Feistel networks compared to other cipher designs such as substitution–permutation networks (SP-networks) is that the entire operation is guaranteed to be invertible (that is, encrypted data can be decrypted), even if the round function is not itself invertible. The round function can be made arbitrarily complicated, since it does not need to be designed to be invertible. Furthermore, the encryption and decryption operations are very similar, even identical in some cases, requiring only a reversal of the key schedule. Therefore, the size of the code or circuitry required to implement such a cipher is nearly halved. Unlike SP-networks, Feistel networks also do not depend on a substitution box that could cause timing side-channels in software implementations. == Theoretical work == The structure and properties of Feistel ciphers have been extensively analyzed by cryptographers. Michael Luby and Charles Rackoff analyzed the Feistel cipher construction and proved that if the round function is a cryptographically secure pseudorandom function, with Ki used as the seed, then 3 rounds are sufficient to make the block cipher a pseudorandom permutation, while 4 rounds are sufficient to make it a "strong" pseudorandom permutation (which means that it remains pseudorandom even to an adversary who gets oracle access to its inverse permutation). Because of this very important result of Luby and Rackoff, Feistel ciphers are sometimes called Luby–Rackoff block ciphers. Further theoretical work has generalized the construction somewhat and given more precise bounds for security. == Construction details == Let F {\displaystyle \mathrm {F} } be the round function and let K 0 , K 1 , … , K n {\displaystyle K_{0},K_{1},\ldots ,K_{n}} be the sub-keys for the rounds 0 , 1 , … , n {\displaystyle 0,1,\ldots ,n} respectively. Then the basic operation is as follows: Split the plaintext block into two equal pieces: ( L 0 {\displaystyle L_{0}} , R 0 {\displaystyle R_{0}} ). For each round i = 0 , 1 , … , n {\displaystyle i=0,1,\dots ,n} , compute L i + 1 = R i , {\displaystyle L_{i+1}=R_{i},} R i + 1 = L i ⊕ F ( R i , K i ) , {\displaystyle R_{i+1}=L_{i}\oplus \mathrm {F} (R_{i},K_{i}),} where ⊕ {\displaystyle \oplus } means XOR. Then the ciphertext is ( R n + 1 , L n + 1 ) {\displaystyle (R_{n+1},L_{n+1})} . Decryption of a ciphertext ( R n + 1 , L n + 1 ) {\displaystyle (R_{n+1},L_{n+1})} is accomplished by computing for i = n , n − 1 , … , 0 {\displaystyle i=n,n-1,\ldots ,0} R i = L i + 1 , {\displaystyle R_{i}=L_{i+1},} L i = R i + 1 ⊕ F ⁡ ( L i + 1 , K i ) . {\displaystyle L_{i}=R_{i+1}\oplus \operatorname {F} (L_{i+1},K_{i}).} Then ( L 0 , R 0 ) {\displaystyle (L_{0},R_{0})} is the plaintext again. The diagram illustrates both encryption and decryption. Note the reversal of the subkey order for decryption; this is the only difference between encryption and decryption. === Unbalanced Feistel cipher === Unbalanced Feistel ciphers use a modified structure where L 0 {\displaystyle L_{0}} and R 0 {\displaystyle R_{0}} are not of equal lengths. The Skipjack cipher is an example of such a cipher. The Texas Instruments digital signature transponder uses a proprietary unbalanced Feistel cipher to perform challenge–response authentication. The Thorp shuffle is an extreme case of an unbalanced Feistel cipher in which one side is a single bit. This has better provable security than a balanced Feistel cipher but requires more rounds. There exists Type-1, Type-2, and Type-3 Feistel networks, where the Feistel function is one fourth the size of the block but operates a varying number of times within one round. === Other uses === The Feistel construction is also used in cryptographic algorithms other than block ciphers. For example, the optimal asymmetric encryption padding (OAEP) scheme uses a simple Feistel network to randomize ciphertexts in certain asymmetric-key encryption schemes. A generalized Feistel algorithm can be used to create strong permutations on small domains of size not a power of two (see format-preserving encryption). === Feistel networks as a design component === Whether the entire cipher is a Feistel cipher or not, Feistel-like networks can be used as a component of a cipher's design. For example, MISTY1 is a Feistel cipher using a three-round Feistel network in its round function, Skipjack is a modified Feistel cipher using a Feistel network in its G permutation, and Threefish (part of Skein) is a non-Feistel block cipher that uses a Feistel-like MIX function. == List of Feistel ciphers == Feistel or modified Feistel: Generalised Feistel: CAST-256 CLEFIA MacGuffin RC2 RC6 Skipjack SMS4

    Read more →
  • Cipher

    Cipher

    In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is encipherment. To encipher or encode is to convert information into cipher or code. In common parlance, "cipher" is synonymous with "code", as they are both a set of steps that encrypt a message; however, the concepts are distinct in cryptography, especially classical cryptography. Codes generally substitute different length strings of characters in the output, while ciphers generally substitute the same number of characters as are input. A code maps one meaning with another. Words and phrases can be coded as letters or numbers. Codes typically have direct meaning from input to key. Codes primarily function to save time. Ciphers are algorithmic. The given input must follow the cipher's process to be solved. Ciphers are commonly used to encrypt written information. Codes operated by substituting according to a large codebook which linked a random string of characters or numbers to a word or phrase. For example, "UQJHSE" could be the code for "Proceed to the following coordinates.". When using a cipher the original information is known as plaintext, and the encrypted form as ciphertext. The ciphertext message contains all the information of the plaintext message, but is not in a format readable by a human or computer without the proper mechanism to decrypt it. The operation of a cipher usually depends on a piece of auxiliary information, called a key (or, in traditional NSA parlance, a cryptovariable). The encrypting procedure is varied depending on the key, which changes the detailed operation of the algorithm. A key must be selected before using a cipher to encrypt a message, with some exceptions such as ROT13 and Atbash. Most modern ciphers can be categorized in several ways: By whether they work on blocks of symbols usually of a fixed size (block ciphers), or on a continuous stream of symbols (stream ciphers). By whether the same key is used for both encryption and decryption (symmetric key algorithms), or if a different key is used for each (asymmetric key algorithms). If the algorithm is symmetric, the key must be known to the recipient and sender and to no one else. If the algorithm is an asymmetric one, the enciphering key is different from, but closely related to, the deciphering key. If one key cannot be deduced from the other, the asymmetric key algorithm has the public/private key property and one of the keys may be made public without loss of confidentiality. == Etymology == Originating from the Sanskrit word for zero शून्य (śuṇya), via the Arabic word صفر (ṣifr), the word "cipher" spread to Europe as part of the Arabic numeral system during the Middle Ages. The Roman numeral system lacked the concept of zero, and this limited advances in mathematics. In this transition, the word was adopted into Medieval Latin as cifra, and then into Middle French as cifre. This eventually led to the English word cipher (also spelt cypher). One theory for how the term came to refer to encoding is that the concept of zero was confusing to Europeans, and so the term came to refer to a message or communication that was not easily understood. The term cipher was later also used to refer to any Arabic digit, or to calculation using them, so encoding text in the form of Arabic numerals is literally converting the text to "ciphers". == Versus codes == In casual contexts, "code" and "cipher" can typically be used interchangeably; however, the technical usages of the words refer to different concepts. Codes contain meaning; words and phrases are assigned to numbers or symbols, creating a shorter message. An example of this is the commercial telegraph code which was used to shorten long telegraph messages which resulted from entering into commercial contracts using exchanges of telegrams. Another example is given by whole word ciphers, which allow the user to replace an entire word with a symbol or character, much like the way written Japanese utilizes Kanji (meaning Chinese characters in Japanese) characters to supplement the native Japanese characters representing syllables. An example using English language with Kanji could be to replace "The quick brown fox jumps over the lazy dog" by "The quick brown 狐 jumps 上 the lazy 犬". Stenographers sometimes use specific symbols to abbreviate whole words. Ciphers, on the other hand, work at a lower level: the level of individual letters, small groups of letters, or, in modern schemes, individual bits and blocks of bits. Some systems used both codes and ciphers in one system, using superencipherment to increase the security. In some cases the terms codes and ciphers are used synonymously with substitution and transposition, respectively. Historically, cryptography was split into a dichotomy of codes and ciphers, while coding had its own terminology analogous to that of ciphers: "encoding, codetext, decoding" and so on. However, codes have a variety of drawbacks, including susceptibility to cryptanalysis and the difficulty of managing a cumbersome codebook. Because of this, codes have fallen into disuse in modern cryptography, and ciphers are the dominant technique. == Types == There are a variety of different types of encryption. Algorithms used earlier in the history of cryptography are substantially different from modern methods, and modern ciphers can be classified according to how they operate and whether they use one or two keys. === Historical === The Caesar Cipher is one of the earliest known cryptographic systems. Julius Caesar used a cipher that shifts the letters in the alphabet in place by three and wrapping the remaining letters to the front to write to Marcus Tullius Cicero in approximately 50 BC. Historical pen and paper ciphers used in the past are sometimes known as classical ciphers. They include simple substitution ciphers (such as ROT13) and transposition ciphers (such as a Rail Fence Cipher). For example, "GOOD DOG" can be encrypted as "PLLX XLP" where "L" substitutes for "O", "P" for "G", and "X" for "D" in the message. Transposition of the letters "GOOD DOG" can result in "DGOGDOO". These simple ciphers and examples are easy to crack, even without plaintext-ciphertext pairs. In the 1640s, the Parliamentarian commander, Edward Montagu, 2nd Earl of Manchester, developed ciphers to send coded messages to his allies during the English Civil War. The English theologian John Wilkins published a book in 1641 titled "Mercury, or The Secret and Swift Messenger" and described a musical cipher wherein letters of the alphabet were substituted for music notes. This species of melodic cipher was depicted in greater detail by author Abraham Rees in his book Cyclopædia (1778). Simple ciphers were replaced by polyalphabetic substitution ciphers (such as the Vigenère) which changed the substitution alphabet for every letter. For example, "GOOD DOG" can be encrypted as "PLSX TWF" where "L", "S", and "W" substitute for "O". With even a small amount of known or estimated plaintext, simple polyalphabetic substitution ciphers and letter transposition ciphers designed for pen and paper encryption are easy to crack. It is possible to create a secure pen and paper cipher based on a one-time pad, but these have other disadvantages. During the early twentieth century, electro-mechanical machines were invented to do encryption and decryption using transposition, polyalphabetic substitution, and a kind of "additive" substitution. In rotor machines, several rotor disks provided polyalphabetic substitution, while plug boards provided another substitution. Keys were easily changed by changing the rotor disks and the plugboard wires. Although these encryption methods were more complex than previous schemes and required machines to encrypt and decrypt, other machines such as the British Bombe were invented to crack these encryption methods. === Modern === Modern encryption methods can be divided by two criteria: by type of key used, and by type of input data. By type of key used ciphers are divided into: symmetric key algorithms (Private-key cryptography), where one same key is used for encryption and decryption, and asymmetric key algorithms (Public-key cryptography), where two different keys are used for encryption and decryption. In a symmetric key algorithm (e.g., DES and AES), the sender and receiver must have a shared key set up in advance and kept secret from all other parties; the sender uses this key for encryption, and the receiver uses the same key for decryption. The design of AES (Advanced Encryption System) was beneficial because it aimed to overcome the flaws in the design of the DES (Data encryption standard). AES's designer's claim that the common means of modern cipher cryptanalytic attacks are ineffective against AES due to its design structure. Ciphers can be distinguished into two types by the type o

    Read more →
  • Snapshot isolation

    Snapshot isolation

    In databases, and transaction processing (transaction management), snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot. Snapshot isolation has been adopted by several major database management systems, such as InterBase, Firebird, Oracle, MySQL, PostgreSQL, SQL Anywhere, MongoDB and Microsoft SQL Server (2005 and later). The main reason for its adoption is that it allows better performance than serializability, yet still avoids most of the concurrency anomalies that serializability avoids (but not all). In practice snapshot isolation is implemented within multiversion concurrency control (MVCC), where generational values of each data item (versions) are maintained: MVCC is a common way to increase concurrency and performance by generating a new version of a database object each time the object is written, and allowing transactions' read operations of several last relevant versions (of each object). Snapshot isolation has been used to criticize the ANSI SQL-92 standard's definition of isolation levels, as it exhibits none of the "anomalies" that the SQL standard prohibited, yet is not serializable (the anomaly-free isolation level defined by ANSI). In spite of its distinction from serializability, snapshot isolation is sometimes referred to as serializable by Oracle. == Definition == A transaction executing under snapshot isolation appears to operate on a personal snapshot of the database, taken at the start of the transaction. When the transaction concludes, it will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was taken. Such a write–write conflict will cause the transaction to abort. In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other. Were the system serializable, such an anomaly would be impossible, as either T1 or T2 would have to occur "first", and be visible to the other. In contrast, snapshot isolation permits write skew anomalies. As a concrete example, imagine V1 and V2 are two balances held by a single person, Phil. The bank will allow either V1 or V2 to run a deficit, provided the total held in both is never negative (i.e. V1 + V2 ≥ 0). Both balances are currently $100. Phil initiates two transactions concurrently, T1 withdrawing $200 from V1, and T2 withdrawing $200 from V2. If the database guaranteed serializable transactions, the simplest way of coding T1 is to deduct $200 from V1, and then verify that V1 + V2 ≥ 0 still holds, aborting if not. T2 similarly deducts $200 from V2 and then verifies V1 + V2 ≥ 0. Since the transactions must serialize, either T1 happens first, leaving V1 = −$100, V2 = $100, and preventing T2 from succeeding (since V1 + (V2 − $200) is now −$200), or T2 happens first and similarly prevents T1 from committing. If the database is under snapshot isolation(MVCC), however, T1 and T2 operate on private snapshots of the database: each deducts $200 from an account, and then verifies that the new total is zero, using the other account value that held when the snapshot was taken. Since neither update conflicts, both commit successfully, leaving V1 = V2 = −$100, and V1 + V2 = −$200. Some systems built using multiversion concurrency control (MVCC) may support (only) snapshot isolation to allow transactions to proceed without worrying about concurrent operations, and more importantly without needing to re-verify all read operations when the transaction finally commits. This is convenient because MVCC maintains a series of recent history consistent states. The only information that must be stored during the transaction is a list of updates made, which can be scanned for conflicts fairly easily before being committed. However, MVCC systems (such as MarkLogic) will use locks to serialize writes together with MVCC to obtain some of the performance gains and still support the stronger "serializability" level of isolation. == Workarounds == Potential inconsistency problems arising from write skew anomalies can be fixed by adding (otherwise unnecessary) updates to the transactions in order to enforce the serializability property. Materialize the conflict Add a special conflict table, which both transactions update in order to create a direct write–write conflict. Promotion Have one transaction "update" a read-only location (replacing a value with the same value) in order to create a direct write–write conflict (or use an equivalent promotion, e.g. Oracle's SELECT FOR UPDATE). In the example above, we can materialize the conflict by adding a new table which makes the hidden constraint explicit, mapping each person to their total balance. Phil would start off with a total balance of $200, and each transaction would attempt to subtract $200 from this, creating a write–write conflict that would prevent the two from succeeding concurrently. However, this approach violates the normal form. Alternatively, we can promote one of the transaction's reads to a write. For instance, T2 could set V1 = V1, creating an artificial write–write conflict with T1 and, again, preventing the two from succeeding concurrently. This solution may not always be possible. In general, therefore, snapshot isolation puts some of the problem of maintaining non-trivial constraints onto the user, who may not appreciate either the potential pitfalls or the possible solutions. The upside to this transfer is better performance. == Terminology == Snapshot isolation is called "serializable" mode in Oracle and PostgreSQL versions prior to 9.1, which may cause confusion with the "real serializability" mode. There are arguments both for and against this decision; what is clear is that users must be aware of the distinction to avoid possible undesired anomalous behavior in their database system logic. == History == Snapshot isolation arose from work on multiversion concurrency control databases, where multiple versions of the database are maintained concurrently to allow readers to execute without colliding with writers. Such a system allows a natural definition and implementation of such an isolation level. InterBase, later owned by Borland, was acknowledged to provide SI rather than full serializability in version 4, and likely permitted write-skew anomalies since its first release in 1985. Unfortunately, the ANSI SQL-92 standard was written with a lock-based database in mind, and hence is rather vague when applied to MVCC systems. Berenson et al. wrote a paper in 1995 critiquing the SQL standard, and cited snapshot isolation as an example of an isolation level that did not exhibit the standard anomalies described in the ANSI SQL-92 standard, yet still had anomalous behaviour when compared with serializable transactions. In 2008, Cahill et al. showed that write-skew anomalies could be prevented by detecting and aborting "dangerous" triplets of concurrent transactions. This implementation of serializability is well-suited to multiversion concurrency control databases, and has been adopted in PostgreSQL 9.1, where it is known as Serializable Snapshot Isolation (SSI). When used consistently, this eliminates the need for the above workarounds. The downside over snapshot isolation is an increase in aborted transactions. This can perform better or worse than snapshot isolation with the above workarounds, depending on workload.

    Read more →
  • Social search

    Social search

    Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram and Flickr. It is an enhanced version of web search that combines traditional algorithms. The idea behind social search is that instead of ranking search results purely based on semantic relevance between a query and the results, a social search system also takes into account social relationships between the results and the searcher. The social relationships could be in various forms. For example, in LinkedIn people search engine, the social relationships include social connections between searcher and each result, whether or not they are in the same industries, work for the same companies, belong the same social groups, and go the same schools, etc. Social search may not be demonstrably better than algorithm-driven search. In the algorithmic ranking model that search engines used in the past, relevance of a site is determined after analyzing the text and content on the page and link structure of the document. In contrast, search results with social search highlight content that was created or touched by other users who are in the Social Graph of the person conducting a search. It is a personalized search technology with online community filtering to produce highly personalized results. Social search takes many forms, ranging from simple shared bookmarks or tagging of content with descriptive labels to more sophisticated approaches that combine human intelligence with computer algorithms. Depending on the feature-set of a particular search engine, these results may then be saved and added to community search results, further improving the relevance of results for future searches of that keyword. The principle behind social search is that human network oriented results would be more meaningful and relevant for the user, instead of computer algorithms deciding the results for specific queries. == Research and implementations == Over the years, there have been different studies, researches and some implementations of Social Search. In 2008, there were a few startup companies that focused on ranking search results according to one's social graph on social networks. Companies in the social search space include Sproose, Mahalo, Jumper 2.0, Scour, Wink, Eurekster, and Delver. Former efforts include Wikia Search. In 2008, a story on TechCrunch showed Google potentially adding in a voting mechanism to search results similar to Digg's methodology. This suggests growing interest in how social groups can influence and potentially enhance the ability of algorithms to find meaningful data for end users. There are also other services like Sentiment that turn search personal by searching within the users' social circles. In 2009, a startup project called HeyStaks (www.heystaks.com) developed a web browser plugin "HayStaks". HeyStaks applies social search through collaboration in web search as a way that leads to better search results. The main motivation for HeyStaks to work on this idea is to provide the user with features that search engines didn't provide at that time. For instance, different searches have indicated that about 70% of the time when user search for something, a friend or a coworker have found it already. Also, studies have shown that approximately, 30% of people who use online search, search for something that they have found before. The startup believe that they help avoid these kind of issues by providing a shared and rich search experience through a list of recommendations that get generated based on search results. In October 2009, Google rolled out its "Social Search"; after a time in beta, the feature was expanded to multiple languages in May 2011. Before the expansion however in 2010 Bing and Google were already taking into account re-tweets and Likes when providing search results. However, after a search deal with Twitter ended without renewal, Google began to retool its Social Search. In January 2012, Google released "Search plus Your World", a further development of Social Search. The feature, which is integrated into Google's regular search as an opt-out feature, pulls references to results from Google+ profiles. The goal was to deliver better, more relevant and personalized search results with this integration. This integration however had some problems in which Google+ still is not wildly adopted or has much usage among many users. Later on, Google was criticized by Twitter for the perceived potential impact of "Search plus Your World" upon web publishers, describing the feature's release to the public as a "bad day for the web", while Google replied that Twitter refused to allow deep search crawling by Google of Twitter's content. By Google integrating Google+, the company was encouraging users to switch to Google's social networking site in order to improve search results. One famous example occurred when Google showed a link to Mark Zuckerberg's dormant Google+ account rather than the active Facebook profile. In November 2014 these accusations started to die down because Google's Knowledge Graph started to finally show links to Facebook, Twitter, and other social media sites. In December 2008, Twitter had re-introduced their people search feature. While the interface had since changed significantly, it allows you to search either full names or usernames in a straight-forward search engine. In January 2013, Facebook announced a new search engine called Graph Search still in the beta stages. The goal was to allow users to prioritize results that were popular with their social circle over the general internet. Facebook's Graph search utilized Facebook's user generated content to target users. Although there have been different researches and studies in social search, social media networks have not vested enough interest in working with search engines. LinkedIn for example has taken steps to improve its own individual search functions in order to stray users from external search engines. Even Microsoft started working with Twitter in order to integrate some tweets into Bing's search results in November 2013. Yet Twitter has its own search engine which points out how much value their data has and why they would like to keep it in house. In the end though social search will never be truly comprehensive of the subjects that matter to people unless users opt to be completely public with their information. == Social discovery == Social discovery is the use of social preferences and personal information to predict what content will be desirable to the user. Technology is used to discover new people and sometimes new experiences shopping, meeting friends or even traveling. The discovery of new people is often in real-time, enabled by mobile apps. However, social discovery is not limited to meeting people in real-time, it also leads to sales and revenue for companies via social media. An example of retail would be the addition of social sharing with music, through the iTunes music store. There is a social component to discovering new music Social discovery is at the basis of Facebook's profitability, generating ad revenue by targeting the ads to users using the social connections to enhance the commercial appeal. == Social search engines == A social search engine in an aspect can be thought of as a search engine that provides an answer for a question from another answer by identifying a person in the answer. That can happen by retrieving a user submitted query and determining that the query is related to the question; and provides an answer, including the link to the resource, as part of search results that are responsive to the query. Few social search engines depend only on online communities. Depending on the feature-set of a particular search engine, these results may then be saved and added to community search results, further improving the relevance of results for future searches of that keyword. Social search engines are considered a part of Web 2.0 because they use the collective filtering of online communities to elevate particularly interesting or relevant content using tagging. These descriptive tags add to the meta data embedded in Web pages, theoretically improving the results for particular keywords over time. A user will generally see suggested tags for a particular search term, indicating tags that have previously been added. An implementation of a social search engine is Aardvark. Aardvark is a social search engine that is based on the "village paradigm" which is about connecting the user who has a question with friends or friends of friends whom can answer his or her question. In Aadvark, a user ask a question in different ways that mostly involves online ways such as instant messaging, email, web input or other non-online ways such as text message or voice. The Aar

    Read more →
  • Intent-based network

    Intent-based network

    Intent-Based Networking (IBN) is an approach to network management that shifts the focus from manually configuring individual devices to specifying desired outcomes or business objectives, referred to as "intents". == Description == Rather than relying on low-level commands to configure the network, administrators define these high-level intents, and the network dynamically adjusts itself to meet these requirements. IBN simplifies the management of complex networks by ensuring that the network infrastructure aligns with the desired operational goals. For example, an implementer can explicitly state a network purpose with a policy such as "Allow hosts A and B to communicate with X bandwidth capacity" without the need to understand the detailed mechanisms of the underlying devices (e.g. switches), topology or routing configurations. == Architecture == Advances in Natural Language Understanding (NLU) systems, along with neural network-based algorithms like BERT, RoBERTa, GLUE, and ERNIE, have enabled the conversion of user queries into structured representations that can be processed by automated services. This capability is crucial for managing the increasing complexity of network services. Intent-Based Networking (IBN) leverages these advancements to simplify network management by abstracting network services, reducing operational complexity, and lowering costs. A proposed three-layered architecture integrates intent-based automation into network management systems. In the business layer, intents are based on Key Performance Indicators (KPIs) and Service Level Agreements (SLAs), reflecting business objectives. The intent layer evaluates and re-plans actions dynamically, where a Knowledge module abstracts and reasons about intents, while an Agent interfaces with network objects to execute actions. The data layer observes network objects, updates topology information, and interacts with the Knowledge and Agent modules to ensure accurate and timely responses to network changes. At the bottom, the network layer contains the physical infrastructure, transforming network data into a usable format for the intent layer to act upon.

    Read more →
  • Coreu

    Coreu

    COREU (French: Correspondance Européenne – Telex network of European correspondents, also EUKOR-Netzwerk in Austria) is a communication network of the European Union for the communication of the Council of the European Union, the European correspondents of the foreign ministries of the EU member states, permanent representatives of member states in Brussels, the European Commission, and the General Secretariat of the Council of the European Union. The European Parliament is not among the participants. COREU is the European equivalent of the American Secret Internet Protocol Router Network (SIPRNet, also known as Intelink-S). COREU's official aim is fast communication in case of crisis. The network enables a closer cooperation in matters regarding foreign affairs. In actuality the system's function exceeds that of mere communication, it also enables decision-making. COREU's first goal is to enable the exchange of information before and after decisions. Relaying upfront negotiations in preparation of meetings is the second goal. In addition, the system also allows the editing of documents and the decision-making, especially if there is little time. While the first two goals are preparatory measures for a shared foreign policy, the third is a methodical variant marked by practise that is defining for the image of the Common Foreign and Security Policy. == Members == (The following information dates from 2013): There is one representative in each of the capital cities in the EU.(since 1973) In Germany for example, this is the European correspondent (EU-KOR) from the Foreign Office. In Austria it is the European correspondent from the Referat II.1.a in the Federal Ministry for Europe, Integration and Foreign Affairs They are the correspondents (since 1982) for the European Commission They comprise the secretariat for the European Council They also make up the European External Action Service (EEAS) (responsible for foreign policy issues, since 1987) == Data volume and technical details == COREU functions as a spoke-hub distribution paradigm system with the hub in Brussels. The network is operated by the European Union Intelligence and Situation Centre (formerly Joint Situation Center, JSC). The technical infrastructure is located in a building of the European Council. COREU may be described as an advanced telex system with encrypted messages via dedicated terminals. Once a message has reached the destination, it is then redistributed via the local media. In contrast, messages of governments are transmitted via local media to the correspondents and from there delivered point-to-point to Brussels via COREU. In 2010, approximately 8500 communications had been distributed over this network. == History == A telex-based communication system under the name COREU was established in 1973. Originally, only the ministries of Foreign Affairs in the European capitals were connected to it. This telex system was replaced in 1997 by the mail system CORTESY (COREU Terminal Equipment System). The name was retained despite the technical innovation. COREU was reportedly compromised by hackers working for the People's Liberation Army Strategic Support Force, allowing for the theft of thousands of low-classified documents and diplomatic cables.

    Read more →
  • SeaTable

    SeaTable

    SeaTable is a no-code platform that allows users to develop and implement business processes. The cloud collaboration service SeaTable is marketed by the GmbH of the same name with headquarters in Mainz and additional offices in Berlin and Beijing, and developed by the same company as Seafile. == History == SeaTable is a collaborative database and low-code application platform developed as part of a joint venture between Seafile Ltd., a software company based in Guangzhou, China, and SeaTable GmbH, a German firm headquartered in Mainz. Founded in 2020, the project represents the international expansion of Seafile, a Chinese developer originally known for its file synchronization and sharing software. While SeaTable's cloud services and European client operations are managed by the German entity, the platform itself is developed in China by Seafile's engineering team. This cross-border structure, described by TechCrunch as an “unconventional path” for a Chinese startup expanding abroad, reflects Seafile's effort to maintain its product development in China while addressing growing scrutiny in Western markets over data governance and corporate control. In 2021, an innovation project led by the Cyber Innovation Hub at the IT School of the German Armed Forces started to evaluate the possibilities of a large-scale deployment at the German Armed Forces. The evaluation project is currently still ongoing. In 2022, SeaTable is optimizing its database backend to allow millions of records within one base in the future. The focus of development is increasingly on automation and visualization. In 2025, SeaTable introduced AI-powered automations with version 6. The update enabled the integration of large language models (LLMs) for text analysis and automated decision-making. SeaTable operates a self-hosted LLM on servers provided by Hetzner (Germany), while self-hosted deployments can connect to any compatible model. == Features == SeaTable combines the traditional capabilities of a spreadsheet such as Excel and supplements them with a wide range of functions for process automation and visualization as well as a fully comprehensive API. SeaTable is not a pure cloud solution, but can alternatively be installed on a private server and operated completely autonomously. In this way, the owner retains full control over their own data. The installation is done via Docker on a Linux server. == Security and privacy == While most no-code platforms exist only as SaaS solutions, SeaTable describes itself as a data-sparse European solution. While initially the SeaTable Cloud was hosted on Amazon AWS, the move to the German data centers of Swiss provider Exoscale then took place in May 2021. This was followed by the replacement of the Freshdesk cloud ticketing system with a self-hosted Zammad instance, and since April 2022 SeaTable has completely dispensed with all tracking cookies on its website.

    Read more →
  • Key (cryptography)

    Key (cryptography)

    A key in cryptography is a piece of information, usually a string of numbers or letters that are stored in a file, which, when processed through a cryptographic algorithm, can encode or decode cryptographic data. Based on the used method, the key can be different sizes and varieties, but in all cases, the strength of the encryption relies on the security of the key being maintained. A key's security strength is dependent on its algorithm, the size of the key, the generation of the key, and the process of key exchange. == Scope == The key is what is used to encrypt data from plaintext to ciphertext. There are different methods for utilizing keys and encryption. === Symmetric cryptography === Symmetric cryptography refers to the practice of the same key being used for both encryption and decryption. === Asymmetric cryptography === Asymmetric cryptography has separate keys for encrypting and decrypting. These keys are known as the public and private keys, respectively. == Purpose == Since the key protects the confidentiality and integrity of the system, it is important to be kept secret from unauthorized parties. With public key cryptography, only the private key must be kept secret, but with symmetric cryptography, it is important to maintain the confidentiality of the key. Kerckhoff's principle states that the entire security of the cryptographic system relies on the secrecy of the key. == Key sizes == Key size is the number of bits in the key defined by the algorithm. This size defines the upper bound of the cryptographic algorithm's security. The larger the key size, the longer it will take before the key is compromised by a brute force attack. Since perfect secrecy is not feasible for key algorithms, researches are now more focused on computational security. In the past, keys were required to be a minimum of 40 bits in length, however, as technology advanced, these keys were being broken quicker and quicker. As a response, restrictions on symmetric keys were enhanced to be greater in size. Currently, 2048 bit RSA is commonly used, which is sufficient for current systems. However, current RSA key sizes would all be cracked quickly with a powerful quantum computer. "The keys used in public key cryptography have some mathematical structure. For example, public keys used in the RSA system are the product of two prime numbers. Thus public key systems require longer key lengths than symmetric systems for an equivalent level of security. 3072 bits is the suggested key length for systems based on factoring and integer discrete logarithms which aim to have security equivalent to a 128 bit symmetric cipher." == Key generation == To prevent a key from being guessed, keys need to be generated randomly and contain sufficient entropy. The problem of how to safely generate random keys is difficult and has been addressed in many ways by various cryptographic systems. A key can directly be generated by using the output of a Random Bit Generator (RBG), a system that generates a sequence of unpredictable and unbiased bits. A RBG can be used to directly produce either a symmetric key or the random output for an asymmetric key pair generation. Alternatively, a key can also be indirectly created during a key-agreement transaction, from another key or from a password. Some operating systems include tools for "collecting" entropy from the timing of unpredictable operations such as disk drive head movements. For the production of small amounts of keying material, ordinary dice provide a good source of high-quality randomness. == Establishment scheme == The security of a key is dependent on how a key is exchanged between parties. Establishing a secured communication channel is necessary so that outsiders cannot obtain the key. A key establishment scheme (or key exchange) is used to transfer an encryption key among entities. Key agreement and key transport are the two types of a key exchange scheme that are used to be remotely exchanged between entities . In a key agreement scheme, a secret key, which is used between the sender and the receiver to encrypt and decrypt information, is set up to be sent indirectly. All parties exchange information (the shared secret) that permits each party to derive the secret key material. In a key transport scheme, encrypted keying material that is chosen by the sender is transported to the receiver. Either symmetric key or asymmetric key techniques can be used in both schemes. The Diffie–Hellman key exchange and Rivest-Shamir-Adleman (RSA) are the most two widely used key exchange algorithms. In 1976, Whitfield Diffie and Martin Hellman constructed the Diffie–Hellman algorithm, which was the first public key algorithm. The Diffie–Hellman key exchange protocol allows key exchange over an insecure channel by electronically generating a shared key between two parties. On the other hand, RSA is a form of the asymmetric key system which consists of three steps: key generation, encryption, and decryption. Key confirmation delivers an assurance between the key confirmation recipient and provider that the shared keying materials are correct and established. The National Institute of Standards and Technology recommends key confirmation to be integrated into a key establishment scheme to validate its implementations. == Management == Key management concerns the generation, establishment, storage, usage and replacement of cryptographic keys. A key management system (KMS) typically includes three steps of establishing, storing and using keys. The base of security for the generation, storage, distribution, use and destruction of keys depends on successful key management protocols. == Key vs password == A password is a memorized series of characters including letters, digits, and other special symbols that are used to verify identity. It is often produced by a human user or a password management software to protect personal and sensitive information or generate cryptographic keys. Passwords are often created to be memorized by users and may contain non-random information such as dictionary words. On the other hand, a key can help strengthen password protection by implementing a cryptographic algorithm which is difficult to guess or replace the password altogether. A key is generated based on random or pseudo-random data and can often be unreadable to humans. A password is less safe than a cryptographic key due to its low entropy, randomness, and human-readable properties. However, the password may be the only secret data that is accessible to the cryptographic algorithm for information security in some applications such as securing information in storage devices. Thus, a deterministic algorithm called a key derivation function (KDF) uses a password to generate the secure cryptographic keying material to compensate for the password's weakness. Various methods such as adding a salt or key stretching may be used in the generation.

    Read more →
  • Data lineage

    Data lineage

    Data lineage refers to the process of tracking how data is generated, transformed, transmitted and used across systems over time. It documents data's origins, transformations and movements, providing detailed visibility into its life cycle. This process simplifies the identification of errors in data analytics workflows, by enabling users to trace issues back to their root causes. Data lineage facilitates the ability to replay specific segments or inputs of the dataflow. This can be used in debugging or regenerating lost outputs. In database systems, this concept is closely related to data provenance, which involves maintaining records of inputs, entities, systems and processes that influence data. Data provenance provides a historical record of data origins and transformations. It supports forensic activities such as data-dependency analysis, error/compromise detection, recovery, auditing and compliance analysis: "Lineage is a simple type of why provenance." Data governance plays a critical role in managing metadata by establishing guidelines, strategies and policies. Enhancing data lineage with data quality measures and master data management adds business value. Although data lineage is typically represented through a graphical user interface (GUI), the methods for gathering and exposing metadata to this interface can vary. Based on the metadata collection approach, data lineage can be categorized into three types: Those involving software packages for structured data, programming languages and Big data systems. Data lineage information includes technical metadata about data transformations. Enriched data lineage may include additional elements such as data quality test results, reference data, data models, business terminology, data stewardship information, program management details and enterprise systems associated with data points and transformations. Data lineage visualization tools often include masking features that allow users to focus on information relevant to specific use cases. To unify representations across disparate systems, metadata normalization or standardization may be required. == Representation of data lineage == Representation broadly depends on the scope of the metadata management and reference point of interest. Data lineage provides sources of the data and intermediate data flow hops from the reference point with backward data lineage, leading to the final destination's data points and its intermediate data flows with forward data lineage. These views can be combined with end-to-end lineage for a reference point that provides a complete audit trail of that data point of interest from sources to their final destinations. As the data points or hops increase, the complexity of such representation becomes incomprehensible. Thus, the best feature of the data lineage view is the ability to simplify the view by temporarily masking unwanted peripheral data points. Tools with the masking feature enable scalability of the view and enhance analysis with the best user experience for both technical and business users. Data lineage also enables companies to trace sources of specific business data to track errors, implement changes in processes and implementing system migrations to save significant amounts of time and resources. Data lineage can improve efficiency in business intelligence BI processes. Data lineage can be represented visually to discover the data flow and movement from its source to destination via various changes and hops on its way in the enterprise environment. This includes how the data is transformed along the way, how the representation and parameters change and how the data splits or converges after each hop. A simple representation of the Data Lineage can be shown with dots and lines, where dots represent data containers for data points, and lines connecting them represent transformations the data undergoes between the data containers. Data lineage can be visualized at various levels based on the granularity of the view. At a very high-level, data lineage is visualized as systems that the data interacts with before it reaches its destination. At its most granular, visualizations at the data point level can provide the details of the data point and its historical behavior, attribute properties and trends and data quality of the data passed through that specific data point in the data lineage. The scope of the data lineage determines the volume of metadata required to represent its data lineage. Usually, data governance and data management of an organization determine the scope of the data lineage based on their regulations, enterprise data management strategy, data impact, reporting attributes and critical data elements of the organization. == Rationale == Distributed systems like Google Map Reduce, Microsoft Dryad, Apache Hadoop (an open-source project) and Google Pregel provide such platforms for businesses and users. However, even with these systems, Big Data analytics can take several hours, days or weeks to run, simply due to the data volumes involved. For example, a ratings prediction algorithm for the Netflix Prize challenge took nearly 20 hours to execute on 50 cores, and a large-scale image processing task to estimate geographic information took 3 days to complete using 400 cores. "The Large Synoptic Survey Telescope is expected to generate terabytes of data every night and eventually store more than 50 petabytes, while in the bioinformatics sector, the 12 largest genome sequencing houses in the world now store petabytes of data apiece. It is very difficult for a data scientist to trace an unknown or an unanticipated result. === Big data debugging === Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Machine learning, among other algorithms, is used to transform and analyze the data. Due to the large size of the data, there could be unknown features in the data. The massive scale and unstructured nature of data, the complexity of these analytics pipelines, and long runtimes pose significant manageability and debugging challenges. Even a single error in these analytics can be extremely difficult to identify and remove. While one may debug them by re-running the entire analytics through a debugger for stepwise debugging, this can be expensive due to the amount of time and resources needed. Auditing and data validation are other major problems due to the growing ease of access to relevant data sources for use in experiments, the sharing of data between scientific communities and use of third-party data in business enterprises. As such, more cost-efficient ways of analyzing data intensive scale-able computing (DISC) are crucial to their continued effective use. === Challenges in Big Data debugging === ==== Massive scale ==== According to an EMC/IDC study, 2.8 ZB of data were created and replicated in 2012. Furthermore, the same study states that the digital universe will double every two years between now and 2020, and that there will be approximately 5.2 TB of data for every person in 2020. Based on current technology, the storage of this much data will mean greater energy usage by data centers. ==== Unstructured data ==== Unstructured data usually refers to information that doesn't reside in a traditional row-column database. Unstructured data files often include text and multimedia content, such as e-mail messages, word processing documents, videos, photos, audio files, presentations, web pages and many other kinds of business documents. While these types of files may have an internal structure, they are still considered "unstructured" because the data they contain doesn't fit neatly into a database. The amount of unstructured data in enterprises is growing many times faster than structured databases are growing. Big data can include both structured and unstructured data, but IDC estimates that 90 percent of Big Data is unstructured data. The fundamental challenge of unstructured data sources is that they are difficult for non-technical business users and data analysts alike to unbox, understand and prepare for analytic use. Beyond issues of structure, the sheer volume of this type of data contributes to such difficulty. Because of this, current data mining techniques often leave out valuable information and make analyzing unstructured data laborious and expensive. In today's competitive business environment, companies have to find and analyze the relevant data they need quickly. The challenge is going through the volumes of data and accessing the level of detail needed, all at a high speed. The challenge only grows as the degree of granularity increases. One possible solution is hardware. Some vendors are using increased memory and parallel processing to crunch large volumes of data quickly. Another method is putting data in-memory but using a grid

    Read more →