Apache ORC

Apache ORC

Apache ORC (Optimized Row Columnar) is a free and open-source column-oriented data storage format. It is similar to the other columnar-storage file formats available in the Hadoop ecosystem such as RCFile and Parquet. It is used by most of the data processing frameworks Apache Spark, Apache Hive, Apache Flink, and Apache Hadoop. In February 2013, the Optimized Row Columnar (ORC) file format was announced by Hortonworks in collaboration with Facebook. A calendar month later, the Apache Parquet format was announced, developed by Cloudera and Twitter. Apache ORC format is widely supported including Amazon Web Services' Glue,Google Cloud Platform's BigQuery, and Pandas (software). == History ==

Lynda Soderholm

Lynda Soderholm is a physical chemist at the U.S. Department of Energy's (DOE) Argonne National Laboratory with a specialty in f-block elements. She is a senior scientist and the lead of the Actinide, Geochemistry & Separation Sciences Theme within Argonne's Chemical Sciences and Engineering Division. Her specific role is the Separation Science group leader within Heavy Element Chemistry and Separation Science (HESS), directing basic research focused on low-energy methods for isolating lanthanide and actinide elements from complex mixtures. She has made fundamental contributions to understanding f-block chemistry and characterizing f-block elements. Soderholm became a Fellow of the American Association for the Advancement of Science (AAAS) in 2013, and is also an Argonne Distinguished Fellow. == Early life and education == Soderholm was awarded her PhD in 1982 by McMaster University under the direction of Prof John Greedan. Her dissertation focused on characterizing the structural and magnetic properties of a series of ternary f-ion oxides. After graduating, she was awarded a NATO postdoctoral fellow at the Centre national de la recherche scientifique in France from 1982 until 1985. After a short postdoctoral appointment as an Argonne postdoctoral fellow she was promoted to staff scientist the same year. Over several years, she moved up the ranks, becoming a senior chemist in 2001. She was also an adjunct professor at the University of Notre Dame from 2003 until 2007. In 2021, Soderholm was appointed interim Division Director for the Chemical Sciences and Engineering Division. == Career and research == === Uncovering structure of Yttrium-123 Superconductor === Early in her career, Soderholm focused on the characterizing the magnetic and electronic behavior of compounds containing f-ions (lanthanides and actinides) with a focus on high-Tc materials, compounds that are superconducting under usually high temperatures. She was part of the research group that first determined the structure of YBa2Cu3O7. Their discovery formed the foundation for the further developments in the broad field of superconductivity. === Understanding f-ion speciation in solution === Continuing her interest in the f-elements, Soderholm shifted her focus from solid-state materials to nanoparticles and solutions, taking advantage of advances in X-ray structural probes made available by synchrotron facilities. Building on her earlier work using neutron scattering, her team became the first to discover that plutonium exists in solution as tiny, well-defined nanoparticles. This work solved a longstanding problem in understanding transport of plutonium in the environment and resulted in the development of a new, patented approach to separating plutonium during nuclear reprocessing. === Using machine learning to evaluate molecular structures === Soderholm's more recent projects use machine learning to understand the influence of complex molecular structuring in solutions, in connection with low-energy processes for separation of f-block elements from complex mixtures. == Awards and honors == University of Chicago Board of Governors' Distinguished Performance Award, 2009. Fellow of the American Association for the Advancement of Science, 2013. Argonne Distinguished Fellow, 2016 DOE materials sciences research competition for Outstanding Scientific Accomplishments in Solid State Physics, 1987. == Select publications == Beno, M. A.; Soderholm, L.; Capone, D. W., II; Hinks, D. G.; Jorgensen, J. D.; Grace, J. D.; Schuller, I. K.; Segre, C. U.; Zhang, K., Structure of the single-phase high-temperature superconductor yttrium barium copper oxide (YBa2Cu3O7−δ). Appl. Phys. Lett. 1987, 51 (1), 57–9. Soderholm, L.; Zhang, K.; Hinks, D. G.; Beno, M. A.; Jorgensen, J. D.; Segre, C. U.; Schuller, I. K., Incorporation of praseodymium in YBa2Cu3O7−δ: electronic effects on superconductivity. Nature (London) 1987, 328 (6131), 604–5. Antonio, M. R.; Williams, C. W.; Soderholm, L., Berkelium redox speciation. Radiochim. Acta 2002, 90 (12), 851–856. Soderholm, L.; Skanthakumar, S.; Neuefeind, J., Determination of actinide speciation in solution using high-energy X-ray scattering. Anal. Bioanal. Chem. 2005, 383 (1), 48–55. Forbes, T. Z.; Burns, P. C.; Skanthakumar, S.; Soderholm, L., Synthesis, structure, and magnetism of Np2O5. J. Am. Chem. Soc. 2007, 129 (10), 2760–2761. Soderholm, L.; Almond, P. M.; Skanthakumar, S.; Wilson, R. E.; Burns, P. C., The structure of the plutonium oxide nanocluster [Pu38O56Cl54(H2O)8]14-. Angew. Chem., Int. Ed. 2008, 47 (2), 298–302. Jensen, M. P.; Gorman-Lewis, D.; Aryal, B.; Paunesku, T.; Vogt, S.; Rickert, P. G.; Seifert, S.; Lai, B.; Woloschak, G. E.; Soderholm, L., An iron-dependent and transferrin-mediated cellular uptake pathway for plutonium. Nat. Chem. Biol. 2011, 7 (8), 560–565. Wilson, R. E.; Skanthakumar, S.; Soderholm, L., Separation of Plutonium Oxide Nanoparticles and Colloids. Angew. Chem., Int. Ed. 2011, 50 (47), 11234–11237. Knope, K. E.; Soderholm, L., Solution and solid-state structural chemistry of actinide hydrates and their hydrolysis and condensation products. Chem. Rev. 2013, 113 (2), 944–994. Luo, G.; Bu, W.; Mihaylov, M.; Kuzmenko, I.; Schlossman, M. L.; Soderholm, L., X-ray reflectivity reveals a nonmonotonic ion-density profile perpendicular to the surface of ErCl3 aqueous solutions. J. Phys. Chem. C 2013, 117 (37), 19082–19090. Jin, G. B.; Lin, J.; Estes, S. L.; Skanthakumar, S.; Soderholm, L., Influence of countercation hydration enthalpies on the formation of molecular complexes: A thorium-nitrate example. J. Am. Chem. Soc. 2017, 139 (49), 18003–18008. == Patents == Solvent extraction system for plutonium colloids and other oxide nano-particles, (2016).

PGP word list

The PGP Word List ("Pretty Good Privacy word list", also called a biometric word list for reasons explained below) is a list of words for conveying data bytes in a clear unambiguous way via a voice channel. They are analogous in purpose to the NATO phonetic alphabet, except that a longer list of words is used, each word corresponding to one of the 256 distinct numeric byte values. == History and structure == The PGP Word List was designed in 1995 by Patrick Juola, a computational linguist, and Philip Zimmermann, creator of PGP. The words were carefully chosen for their phonetic distinctiveness, using genetic algorithms to select lists of words that had optimum separations in phoneme space. The candidate word lists were randomly drawn from Grady Ward's Moby Pronunciator list as raw material for the search, successively refined by the genetic algorithms. The automated search converged to an optimized solution in about 40 hours on a DEC Alpha, a particularly fast machine in that era. The Zimmermann–Juola list was originally designed to be used in PGPfone, a secure VoIP application, to allow the two parties to verbally compare a short authentication string to detect a man-in-the-middle attack (MiTM). It was called a biometric word list because the authentication depended on the two human users recognizing each other's distinct voices as they read and compared the words over the voice channel, binding the identity of the speaker with the words, which helped protect against the MiTM attack. The list can be used in many other situations where a biometric binding of identity is not needed, so calling it a biometric word list may be imprecise. Later, it was used in PGP to compare and verify PGP public key fingerprints over a voice channel. This is known in PGP applications as the "biometric" representation. When it was applied to PGP, the list of words was further refined, with contributions by Jon Callas. More recently, it has been used in Zfone and the ZRTP protocol, the successor to PGPfone. The list is actually composed of two lists, each containing 256 phonetically distinct words, in which each word represents a different byte value between 0 and 255. Two lists are used because reading aloud long random sequences of human words usually risks three kinds of errors: 1) transposition of two consecutive words, 2) duplicate words, or 3) omitted words. To detect all three kinds of errors, the two lists are used alternately for the even-offset bytes and the odd-offset bytes in the byte sequence. Each byte value is actually represented by two different words, depending on whether that byte appears at an odd or an even offset from the beginning of the byte sequence. The two lists are readily distinguished by the number of syllables; the odd list has words of three syllables, the even list has two. The two lists have a maximum word length of 11 and 9 letters, respectively. Using a two-list scheme was suggested by Zhahai Stewart. == Examples == Each byte in a bytestring is encoded as a single word. A sequence of bytes is rendered in network byte order, from left to right. For example, the leftmost (i.e. byte 0) is considered "even" and is encoded using the PGP Even Word table. The next byte to the right (i.e. byte 1) is considered "odd" and is encoded using the PGP Odd Word table. This process repeats until all bytes are encoded. Thus, "E582" produces "topmost Istanbul", whereas "82E5" produces "miser travesty". A PGP public key fingerprint that displayed in hexadecimal as E582 94F2 E9A2 2748 6E8B 061B 31CC 528F D7FA 3F19 would display in PGP Words (the "biometric" fingerprint) as topmost Istanbul Pluto vagabond treadmill Pacific brackish dictator goldfish Medusa afflict bravado chatter revolver Dupont midsummer stopwatch whimsical cowbell bottomless The order of bytes in a bytestring depends on endianness. == Other word lists for data == There are several other word lists for conveying data in a clear unambiguous way via a voice channel: the NATO phonetic alphabet maps individual letters and digits to individual words the S/KEY system maps 64 bit numbers to 6 short words of 1 to 4 characters each from a publicly accessible 2048-word dictionary. The same dictionary is used in RFC 1760 and RFC 2289. the Diceware system maps five base-6 random digits (almost 13 bits of entropy) to a word from a dictionary of 7,776 distinct words. the Electronic Frontier Foundation has published a set of improved word lists based on the same concept FIPS 181: Automated Password Generator converts random numbers into somewhat pronounceable "words". mnemonic encoding converts 32 bits of data into 3 words from a vocabulary of 1626 words. what3words encodes geographic coordinates in 3 dictionary words. the BIP39 standard permits encoding a cryptographic key of fixed size (128 or 256 bits, usually the unencrypted master key of a Cryptocurrency wallet) into a short sequence of readable words known as the seed phrase, for the purpose of storing the key offline. This is used in cryptocurrencies such as Bitcoin or Monero. Like the PGP word list, the Bytewords standard maps each possible byte to a word. There is only one list, rather than two. The words are uniformly four letters long and can be uniquely identified by their first and last letters

Thirst trap

A thirst trap is a type of social media post intended to entice viewers sexually. It refers to a viewer's "thirst", a colloquialism likening sexual frustration to dehydration, implying desperation, with the afflicted individual being described as "thirsty". The phrase entered into the lexicon in the late 1990s, but is most related to Internet slang that developed in the early 2010s. Its meaning has changed over time, previously referring to a graceless need for approval, affection or attention. == History == The term thirst trap originated within selfie culture, though its precise origins remain unclear. An early use of the phrase with reference to dehydration appears in the 1999 book Running for Dummies by Florence Griffith Joyner and John Hanc, where it referred to the deceptive sensation of thirst being quenched after initial fluid intake, advising continued hydration to avoid the so-called "thirst trap." The modern usage of thirst trap resurfaced around 2011 on platforms such as Twitter and Urban Dictionary, coinciding with the growing popularity of Snapchat, Instagram, and dating apps like Tinder and Grindr. In 2011, Urban Dictionary defined it as "any statement used to intentionally create attention or 'thirst'." By 2018, the term had entered mainstream discourse, appearing in outlets such as The New York Times and GQ without the need for explanation. == Usage of the term == Often, the term thirst trap describes an attractive picture of an individual that they post online. Thirst trap can also describe a digital heartthrob. For instance, former Canadian prime minister Justin Trudeau has been described as a political thirst trap. It has also been described as a modern form of "fishing for compliments". == Motivation == Thirst trapping may be driven by a variety of motives. Individuals often seek attention through "likes" and comments on social media, which can offer a temporary sense of validation and improved self-esteem. It can also serve as an outlet for expressing one's sexuality or enhancing a personal brand. In some cases, sharing such content may provide financial gain. Others might post thirst traps to cope with emotional distress, such as after breakup, or to spite a former lover. Sharing a thirst trap has also been used as a way to connect in times of social isolation (e.g. COVID-19 pandemic). From a physiological standpoint, endorphins and neurotransmitters like oxytocin and dopamine are released during sexual contact. It has been speculated outside of the academic setting that sharing and engaging with thirst traps may elicit similar pleasure responses. == Methodology == Methodologies have developed to take an optimal thirst trap photo. Reporting for Vice magazine, Graham Isador found several of his social network contacts spent a lot of time considering how to take the best photo and what text they should use. They considered angles and lighting. Sometimes they made use of the self-timer feature available on some cameras. Often, body parts are put on display without being too explicit (e.g. bulges of male genitalia, breast cleavage, abdominal muscles, pectoral muscles, backs, buttocks). Often, the thirst trap is accompanied by a caption. For instance, in October 2019, actress Tracee Ellis Ross posted bikini pictures on Instagram with a caption that included the message: "I've worked so hard to feel good in my skin and to build a life that truly matches me and I'm in it and it feels good. ... No filter, no retouch 47 year old thirst trap! Boom!" On Instagram, #ThirstTrapThursdays is a popular tag. Followers reply in turn after a posting. == Variations == "Gatsbying" is a variation of the thirst trap, where one puts posts on social media to attract the attention of a particular individual. The term alludes to the novel The Great Gatsby where the character Jay Gatsby would throw extravagant parties to attract the attention of his love interest, Daisy. "Instagrandstanding" is an alternative name for this. "Wholesome trapping" has developed, where one posts pictures of more meaningful aspects of life, such as spending time with friends or doing outdoor activities. == Criticism == Psychotherapist Lisa Brateman has criticized thirst traps as an unhealthy method of receiving external validation. This desire for external validation can be addictive. Thirst traps can cause pressure to maintain a good physical appearance, and therefore cause self-esteem issues. Additionally, thirst traps are often highly choreographed and thus present a distorted perception of reality. The manufacturing of thirst traps can be limited when one enters a relationship or with time as the body ages. In some cases, thirst traps can lead to harassment and online bullying. In April 2020, model Chrissy Teigen posted a video of herself wearing a black one-piece swimsuit, and she received a multitude of negative comments that constituted bullying and body shaming.

Data philanthropy

Data philanthropy refers to the practice of private companies donating corporate data. This data is usually donated to nonprofits or donation-run organizations that have difficulty keeping up with expensive data collection technology. The concept was introduced through the United Nations Global Pulse initiative in 2011 to explore corporate data assets for humanitarian, academic, and societal causes. For example, anonymized mobile data could be used to track disease outbreaks, or data on consumer actions may be shared with researchers to study public health and economic trends. == Definition == A large portion of data collected from the internet consists of user-generated content, such as blogs, social media posts, and information submitted through lead generation and data forms. Additionally, corporations gather and analyze consumer data to gain insight into customer behavior, identify potential markets, and inform investment decisions. United Nations Global Pulse director Robert Kirkpatrick has referred to this type of data as "massive passive data" or "data exhaust." == Challenges == While data philanthropy can enhance development policies, making users' private data available to various organizations raises concerns regarding privacy, ownership, and the equitable use of data. Different techniques, such as differential privacy and alphanumeric strings of information, can allow access to personal data while ensuring user anonymity. However, even if these algorithms work, re-identification may still be possible. Another challenge is convincing corporations to share their data. The data collected by corporations provides them with market competitiveness and insight regarding consumer behavior. Corporations may fear losing their competitive edge if they share the information they have collected with the public. Numerous moral challenges are also encountered. In 2016, Mariarosaria Taddeo, a digital ethics professor at the University of Oxford, proposed an ethical framework to address them. == Sharing strategies == The goal of data philanthropy is to create a global data commons where companies, governments, and individuals can contribute anonymous, aggregated datasets. The United Nations Global Pulse offers four different tactics that companies can use to share their data that preserve consumer anonymity: Share aggregated and derived data sets for analysis under nondisclosure agreements (NDA) Allow researchers to analyze data within the private company's own network under NDAs Real-Time Data Commons: data pooled and aggregated among multiple companies of the same industry to protect competitiveness Public/Private Alerting Network: companies mine data behind their own firewalls and share indicators == Application in various fields == Many corporations take part in data philanthropy, including social networking platforms (e.g., Facebook, Twitter), telecommunications providers (e.g., Verizon, AT&T), and search engines (e.g., Google, Bing). Collecting and sharing anonymized, aggregated user-generated data is made available through data-sharing systems to support research, policy development, and social impact initiatives. By participating in such efforts, these organizations contribute to causes regarded as beneficial to society, allowing institutions to give back meaningfully. With the onset of technological advancements, the sharing of data on a global scale and an in-depth analysis of these data structures could mitigate the effects of global issues such as natural disasters and epidemics. Robert Kirkpatrick, the Director of the United Nations Global Pulse, has argued that this aggregated information is beneficial for the common good and can lead to developments in research and data production in a range of varied fields. === Digital disease detection === Health researchers use digital disease detection by collecting data from various sources—such as social media platforms (e.g., Twitter, Facebook), mobile devices (e.g., cell phones, smartphones), online search queries, mobile apps, and sensor data from wearables and environmental sensors—to monitor and predict the spread of infectious diseases. This approach allows them to track and anticipate outbreaks of epidemics (e.g., COVID-19, Ebola), pandemics, vector-borne diseases (e.g., malaria, dengue fever), and respiratory illnesses (e.g., influenza, SARS), improving response and intervention strategies for the spread of diseases. In 2008, Centers for Disease Control and Prevention collaborated with Google and launched Google Flu Trends, a website that tracked flu-related searches and user locations to track the spread of the flu. Users could visit Google Flu Trends to compare the amount of flu-related search activity versus the reported numbers of flu outbreaks on a graphical map. One drawback of this method of tracking was that Google searches are sometimes performed due to curiosity rather than when an individual is suffering from the flu. According to Ashley Fowlkes, an epidemiologist in the CDC Influenza division, "The Google Flu Trends system tries to account for that type of media bias by modeling search terms over time to see which ones remain stable." Google Flu Trends is no longer publishing current flu estimates on the public website; however, visitors to the site can still view and download previous estimates. Current data can be shared with verified researchers. A study from the Harvard School of Public Health (HSPH), published in the October 12, 2012 issue of Science, discussed how phone data helped curb the spread of malaria in Kenya. The researchers mapped phone calls and texts made by 14,816,521 Kenyan mobile phone subscribers. When individuals left their primary living location, the destination and length of journey were calculated. This data was then compared to a 2009 malaria prevalence map to estimate the disease's commonality in each location. Combining all this information, the researchers could estimate the probability of an individual carrying malaria and map the movement of the disease. This research can be used to track the spread of similar diseases. === Humanitarian aid === Calling patterns of mobile phone users can determine the socioeconomic standings of the populace, which can be used to deduce "its access to housing, education, healthcare, and basic services such as water and electricity." Researchers from Columbia University and Karolinska Institute used daily SIM card location data from both before and after the 2010 Haiti earthquake to estimate the movement of people both in response to the earthquake and during the related 2010 Haiti cholera outbreak. Their research suggests that mobile phone data can provide rapid and accurate estimates of population movements during disasters and outbreaks of infectious disease. Big data can also provide information on looming disasters and can assist relief organizations in rapid-response and locating displaced individuals. By analyzing specific patterns within this 'big data', governments and NGOs can enhance responses to disruptive events such as natural disasters, disease outbreaks, and global economic crises. Leveraging real-time information enables a deeper understanding of individual well-being, allowing for more effective interventions. Corporations utilize digital services, such as human sensor systems, to detect and solve impending problems within communities. This is a strategy used by the private sector to anonymously share customer information for public benefit, while preserving user privacy. === Impoverished areas === Poverty still remains a worldwide issue, with over 2.5 billion people currently impoverished. Statistics indicate the widespread use of mobile phones, even within impoverished communities. Additional data can be collected through Internet access, social media, utility payments and governmental statistics. Data-driven activities can lead to the accumulation of 'big data', which in turn can assist international non-governmental organizations in documenting and evaluating the needs of underprivileged populations. Through data philanthropy, NGOs can distribute information while cooperating with governments and private companies. === Corporate === Data philanthropy incorporates aspects of social philanthropy by allowing corporations to create profound impacts through the act of giving back by dispersing proprietary datasets. The public sector collects and preserves information, considered an essential asset. Companies track and analyze users' online activities to gain insight into their needs related to new products and services. These companies view the welfare of the population as key to business expansion and progression by using their data to highlight global citizens' issues. Experts in the private sector emphasize the importance of integrating diverse data sources—such as retail, mobile, and social media data—to develop essential solutions for global challenges. In Data Philanthropy:

Wadhwani Institute for Artificial Intelligence

Wadhwani AI, based in Mumbai, Maharashtra, is an independent, non-profit institute. Founded in 2018, it is dedicated to developing Artificial intelligence solutions for social good. Their mission is to build AI-based innovations and solutions for underserved communities in developing countries, for a wide range of domains including agriculture, education, financial inclusion, healthcare, and infrastructure. == History and funding == The institute was founded with a $30 million philanthropic effort by the Wadhwani brothers, Romesh Wadhwani and Sunil Wadhwani. The institute was inaugurated and dedicated to the nation by Narendra Modi, the 14th Prime Minister of India. In 2019, the institute received a $2 million grant from Google.org to create technologies to help reduce crop losses in cotton farming, through integrated pest management. The United States Agency for International Development awarded $2 million to the institute in 2020 to develop tools, using mathematical modeling techniques and digital technologies such as artificial intelligence and machine learning, to forecast COVID-19 disease patterns, estimate resources needed, and plan interventions. == Collaboration == With assistance from Google, the Ministry of Agriculture and Farmers' Welfare and the Wadhwani AI developed Krishi 24/7, the first AI-powered automated agricultural news monitoring and analysis tool. Through better decision-making, Krishi 24/7 will support the identification of valuable news, provide timely notifications, and respond quickly to safeguard farmers' interests and advance sustainable agricultural growth. The application converts news articles into English after scanning them in several languages. It ensures that the ministry is informed in a timely manner about pertinent occurrences that are published online by extracting key information from news items, including the headline, crop name, event type, date, location, severity, summary, and source link. The National Center for Disease Control has effectively implemented a comparable automated surveillance and analysis tool for disease outbreaks.

Squeaky Dolphin

Squeaky Dolphin is a program developed by the Government Communications Headquarters (GCHQ), a British intelligence and security organization, to collect and analyze data from social media networks. The program was first revealed to the general public on NBC on 27 January 2014 based on documents previously leaked by Edward Snowden. == Scope of surveillance == According to a document of the GCHQ dated August 2012, the program enables broad, real-time surveillance of the following items: YouTube video views The Like button on Facebook. Facebook has since then encrypted the data. Blogspot/Blogger visits Twitter, which has however encrypted its communications since this presentation was made The program can be supplemented with commercially available analytic software to determine which videos are popular among residents of specific cities. The dashboard software chosen was made by Splunk. The presentation, which was originally shown to an NSA audience and was made public by the NBC, contains a note saying the program was "Not interested in individuals just broad trends!". However, "according to other Snowden documents" obtained by NBC, in 2010, "GCHQ exploited unencrypted data from Twitter to identify specific users around the world and target them with propaganda."