AI App Kisne Banaya

AI App Kisne Banaya — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Semantic analytics

    Semantic analytics

    Semantic analytics, also termed semantic relatedness, is the use of ontologies to analyze content in web resources. This field of research combines text analytics and Semantic Web technologies like RDF. Semantic analytics measures the relatedness of different ontological concepts. Some academic research groups that have active project in this area include Kno.e.sis Center at Wright State University among others. == History == An important milestone in the beginning of semantic analytics occurred in 1996, although the historical progression of these algorithms is largely subjective. In his seminal study publication, Philip Resnik established that computers have the capacity to emulate human judgement. Spanning the publications of multiple journals, improvements to the accuracy of general semantic analytic computations all claimed to revolutionize the field. However, the lack of a standard terminology throughout the late 1990s was the cause of much miscommunication. This prompted Budanitsky & Hirst to standardize the subject in 2006 with a summary that also set a framework for modern spelling and grammar analysis. In the early days of semantic analytics, obtaining a large enough reliable knowledge bases was difficult. In 2006, Strube & Ponzetto demonstrated that Wikipedia could be used in semantic analytic calculations. The usage of a large knowledge base like Wikipedia allows for an increase in both the accuracy and applicability of semantic analytics. == Methods == Given the subjective nature of the field, different methods used in semantic analytics depend on the domain of application. No singular methods is considered correct, however one of the most generally effective and applicable method is explicit semantic analysis (ESA). ESA was developed by Evgeniy Gabrilovich and Shaul Markovitch in the late 2000s. It uses machine learning techniques to create a semantic interpreter, which extracts text fragments from articles into a sorted list. The fragments are sorted by how related they are to the surrounding text. Latent semantic analysis (LSA) is another common method that does not use ontologies, only considering the text in the input space. == Applications == Entity linking Ontology building / knowledge base population Search and query tasks Natural language processing Spoken dialog systems (e.g., Amazon Alexa, Google Assistant, Microsoft's Cortana) Artificial intelligence Knowledge management The application of semantic analysis methods generally streamlines organizational processes of any knowledge management system. Academic libraries often use a domain-specific application to create a more efficient organizational system. By classifying scientific publications using semantics and Wikipedia, researchers are helping people find resources faster. Search engines like Semantic Scholar provide organized access to millions of articles.

    Read more →
  • Ballin' (Mustard and Roddy Ricch song)

    Ballin' (Mustard and Roddy Ricch song)

    "Ballin'" is a song by American record producer Mustard featuring American rapper Roddy Ricch. The track was released as the third single from Mustard's third studio album, Perfect Ten, on August 20, 2019, though it was available as early as the end of June 2019. The song and its accompanying video received acclaim from music critics, with Complex magazine naming it the Best Song of 2019. It peaked at number 11 on the Billboard Hot 100, marking Mustard's highest charting song in the US. The song received a nomination for Best Rap/Sung Performance at the 2020 Grammy Awards, making it the first time Ricch has been nominated for a Grammy and Mustard's first nomination as an artist. Later in 2019, the two released another collaboration, "High Fashion". == Background == Roddy Ricch revealed in an interview that the song was composed in late 2018, but Mustard wanted to keep it for his album, Perfect Ten, which he was still working on. The song was later included on the album, released in June 2019. Ricch said he knew the song was "hard enough" the first time he heard it, while Mustard proclaimed "this is going to be the one". == Composition and lyrics == "Ballin'" has a "rags to riches" theme. In its intro, the song samples girl group 702's 1997 top ten hit "Get It Together". The song features a "smooth, bouncy beat", with Roddy Ricch rapping about his come-up and ascent in the music industry. In the first verse, Ricch salutes fellow Los Angeles rapper, the late Nipsey Hussle and his girlfriend Lauren London: "I run these racks up with my queen like London and Nip". The line simultaneously references Ricch and Hussle's collaboration "Racks in the Middle", released earlier in 2019 as Hussle's last single before his death. Billboard's Heran Mamo noted that "in typical Hussle fashion", Roddy Ricch "narrates his life's hardships before delving into his newfound treasures". == Critical reception == The song was widely acclaimed by music critics. Charles Holmes of Rolling Stone magazine called it "a song of the year contender", while Complex and Billboard both named it as a "standout track" on the album. Pitchfork magazine included "Ballin'" in its list of The Best Rap Songs of 2019 and called it "the centerpiece of Mustard's underappreciated album Perfect Ten". Complex later named it the Best Song of 2019, calling it "a feel-good anthem so infectious you'll need antibiotics just to stop running it back". == Chart performance == "Ballin'" was at the time Mustard's highest charting song in the US, peaking at number 11 on the Billboard Hot 100. It was also Roddy Ricch's highest charting song, until he surpassed it a week later, with the release of his album track "The Box", which eventually reached number 1 on the chart. It reached number one on Billboard's Rhythmic Songs chart, becoming Mustard's second number one following "Pure Water" and Ricch's first number one. The song also topped the Rap Airplay chart. == Music video == The music video for the track was teased by Mustard on his Instagram page on September 29, 2019. The music video for the track was eventually released on October 2, 2019 to critical acclaim. The video features Mustard and Roddy Ricch driving a Lamborghini Aventador in Los Angeles, where they both are from, playing poker in a casino, and going to a strip club. This is contrasted with scenes in which Mustard and Roddy Ricch as children play cards with Monopoly money and playing with miniature toy Lamborghinis together, aspiring for wealth and luxury, representing how they went from "rags to riches". The video also pays tribute to rapper Nipsey Hussle, who had been killed a few months ago. == Live performances == On December 16, 2019, Roddy Ricch performed the song live, alongside an 8-piece orchestra, at Peppermint Club in Los Angeles for Audiomack's Trap Symphony series. Along with Mustard, he performed it at The Pop Out: Ken & Friends on June 19, 2024. == Other uses == The song can be heard on "Elyse's Skit", track 10 off Roddy Ricch's debut album Please Excuse Me for Being Antisocial. In the skit, which is an actual voicenote recording, the mother of a woman named Elyse sends her daughter a voicenote, with "Ballin'" playing in the background, while the mother proceeds to say "I can't get that damn song out my head", jokingly calling it "inappropriate music". Ricch called the skit "something natural". In 2023, AI covers of the song using models based on pop culture characters and real-world celebrities gained viral popularity. == Awards and nominations == 62nd Annual Grammy Awards == Charts == == Certifications ==

    Read more →
  • Document capture software

    Document capture software

    Document capture software refers to applications that provide the ability and feature set to automate the process of scanning paper documents or importing electronic documents, often for the purposes of feeding advanced document classification and data collection processes. Most scanning hardware, both scanners and copiers, provides the basic ability to scan to any number of image file formats, including: PDF, TIFF, JPG, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process. == Typical features == Typical features of Document Capture Software include: Barcode recognition Patch Code recognition Separation Optical Character Recognition (OCR) Optical Mark Recognition (OMR) Quality Assurance Indexing Migration === Goal for implementation of a document capture solution === The goal for implementing a document capture solution is to reduce the amount of time spent scanning, separating, enhancing, organizing, classifying, normalizing, and collecting information from document collections, and to produce metadata along with an image/PDF file, and/or OCR text. This information is then migrated to a file share, FTP site, database, Document Management or Enterprise Content Management system. These systems often provide a search function, allowing search of the assets based on the produced metadata, and then viewed using document imaging software. == General document capture system solutions == === Integration with document management system === ECM (Enterprise Content management) and their DMS component (Document Management System) are being adopted by many organizations as a corporate document management system for all types of electronic files, e.g. MS word, PDF ... However, much of the information held by organisations is on paper and this needs to be integrated within the same document repository. By converting paper documents into digital format through scanning, organizations convert paper into image formats such as TIF, JPG, and PDF, and also extract valuable index information or business data from the document using OCR technology. Digital documents and associated metadata can easily be stored in the ECM in a variety of formats. The most popular of these formats is PDF which not only provides an accurate representation of the document but also allows all the OCR text in the document to be stored behind the PDF image. This format is known as PDF with hidden text or text-searchable PDF. This allows users to search for documents by using keywords in the metadata fields or by searching the content of PDF files across the repository. ==== Advantages of scanning documents into a ECM/DMS ==== Information held on paper is usually just as valuable to organisations as the electronic documents that are generated internally. Often this information represents a large proportion of the day to day correspondence with suppliers and customers. Having the ability to manage and share this information internally through a document management system such as SharePoint or a CMIS-compatible repository improves collaboration between departments or employees and also eliminates the risk of losing this information through disasters such as floods or fire. Organisations adopting an ECM/DMS often implement electronic workflow which allows the information held on paper to be included as part of an electronic business process and incorporated into a customer record file along with other associated office documents and emails. For business critical documents, such as purchase orders and supplier invoices, digitising documents helps speed up business transactions as well as reduce manual effort involved in keying data into business systems, such as CRM, ERP and Accounting. Scanned invoices can also be routed to managers for payment approval via email or an electronic workflow. == Electronic document capture == In the earlier implementations of Document Capture Software, the technology focused solely on the digitization and capture of information from paper documents. Document images were acquired from document scanners via TWAIN/ISIS drivers. Only image-based file formats like TIF, JPG, and BMP were typically compatible with these solutions. But in recent years, as the volume of electronically-created documents and the number of proprietary file formats continues to increase at exponential rates, the need for handling documents existing in electronic formats has grown. The relevant document capture products have adapted to function with non-image file formats with the end-goal of creating a unified processing workflow capable of handling all incoming documents The ability to import files from a variety of sources is one example of such adaptation. Importing documents from ECM/DMS software solutions, email servers, FTP, and EDI is now as much of a requirement of document capture software as is paper capture. The normalization of output files to text-based PDF format is now another critical factor in long-term archival of proprietary electronic file formats. Normalization expands access and usage of files to users throughout the enterprise, rather than only those that created the original electronic file.

    Read more →
  • International Philosophical Bibliography

    International Philosophical Bibliography

    The International Philosophical Bibliography (IPB), also known in French as Répertoire bibliographique de la philosophie (RBP), is a bibliographic database covering publications on the history of philosophy and continental philosophy. The database comprises records of publications in over 30 languages. Annually, about 12,000 records are added. The indexes include, among other elements, over 84,000 names of authors, editors, translators, reviewers, and collaborators, as well as more than 3,000 commentaries on philosophical works, making it the world's most complete index in Philosophy. Since 1934, the IPB has been developed by the Higher Institute of Philosophy at the University of Louvain (UCLouvain), first in Leuven and since 1978 in Louvain-la-Neuve. The online version was launched by Peeters Publishers in 1997 and continues to be updated quarterly.

    Read more →
  • Artificial intelligence of things

    Artificial intelligence of things

    Artificial Intelligence of Things (AIoT) is the combination of artificial intelligence (AI) technologies with the Internet of things (IoT) infrastructure to create systems capable of sensing, learning, and acting on data without continuous human intervention. While IoT focuses on connectivity and sensor data collection, AI enables IoT devices to analyse data in real time and produce actionable outputs, including automated decisions at the edge. == Applications == === Manufacturing and predictive maintenance === Manufacturing accounts for the largest share of AIoT adoption by industry vertical. A common application is predictive maintenance, where sensors measuring vibration, temperature, current draw, and acoustic emissions feed machine learning models trained to detect signatures that precede equipment failure. These systems can flag developing faults weeks or months in advance, and in more advanced deployments can autonomously adjust machine parameters such as motor speed or cooling cycles to delay or prevent failure. === Other industries === In healthcare, AIoT enables remote patient monitoring through wearable devices that collect vital signs and apply AI models to detect anomalies or predict deterioration. In logistics, GPS and telematics sensors combined with AI models support real-time route optimisation, vehicle maintenance prediction, and fuel cost forecasting. Smart building systems use occupancy, temperature, and energy sensors with AI to dynamically adjust HVAC and lighting, reducing energy consumption. == Architecture == AIoT systems typically operate across three layers: a device layer of sensors and actuators that collect data, a connectivity layer that transmits data via protocols such as MQTT or HTTP, and a compute layer where AI models process the data either in the cloud or at the edge. The trend toward edge-based processing, where inference runs on low-cost processors near the data source rather than in a centralised cloud, has accelerated as hardware costs have fallen and applications increasingly require sub-second response times. == Market == Market sizing estimates for AIoT vary significantly depending on scope and definition. Fortune Business Insights valued the AIoT market at USD 35.65 billion in 2023, projecting growth to USD 253.86 billion by 2030 at a compound annual growth rate of 32.4%. Grand View Research estimated the broader market at USD 171.4 billion in 2024 with a CAGR of 31.7% through 2030, reflecting a wider definition that includes AI-integrated hardware components. North America accounted for approximately 40% of global market share in 2024, with the Asia-Pacific region projected as the fastest-growing market.

    Read more →
  • Manufacture Modules Technologies

    Manufacture Modules Technologies

    Manufacture Modules Technologies Sarl (MMT) is a Swiss company established in Geneva in 2015 which originally specialised in the development and commercialization of "Horological Smartwatch modules", firmware, apps and cloud. Located at Geneva's Skylab high-tech hub, it expanded into the development and manufacturing of "E-Straps" operated with a mobile application. Philippe Fraboulet is the CEO. == History == In June 2015, Fullpower Technologies and Union Horlogère Suisse (Swiss Watchmakers Corporation) formed MMT as a joint venture, which then launched the MotionX Horological Smartwatch Open Platform for the Swiss watch industry. The initial licensees were Frederique Constant, Alpina and Mondaine, brands owned by Union Horlogère Suisse. Fullpower created and managed the circuit design, firmware, smartphone applications (including sleep activity), as well as the cloud Infrastructure. MMT managed the Swiss watch movement development and production as well as licensing and support. In July 2016, Union Horlogere Holding and MMT were spun-out of the Frédérique Constant Group. Fullpower Technologies' 19.99% share was acquired by Union Horlogere Holding BV, giving it 100% of MMT's shares. == Business == The company offers firmware, a cloud, manufacturing, service and over-the-air facilities for upgrades. The company also offers its own apps, which bear the label “Swiss Made software”.

    Read more →
  • Harold Borko

    Harold Borko

    Harold Borko (1922-2012) was an American psychologist and researcher working primarily in the field of information science. == Biography == Borko was born in 1922 in New York City, New York. After serving in the US Army from 1942 to 1946 he obtained a BA in Psychology from the University of California, Los Angeles in 1948 and both his MA and PhD from the University of Southern California in Psychology in 1952. He returned to the army as a psychologist until 1956 after which he began a career working in and teaching information science. He died in California in 2012. == Information Science Career == After leaving the military Borko began working at the RAND Corporation as a Systems Training Specialist in 1956 and moved to the Systems Development Corporation a year later working in the Language Processing and Retrieval department. Alongside this work he taught Psychology at USC from 1957-65 and then moved into teaching Library Science at UCLA from 1965. In 1967 Borko left his role at the Systems Development Corporation and continued as a full-time professor at UCLA until his retirement in 1993.. From 1961 to 1995 Borko authored and co-authored over 100 articles on new developments in the field as well as the historiography of information science. He served as an editor of the Journal of Educational Data Processing from 1963-1975 and as President of the American Society for Information Science in 1966 == Partial list of works == Borko, H. (1962, May). The construction of an empirically based mathematically derived classification system. In Proceedings of the May 1-3, 1962, spring joint computer conference (pp. 279-289). Borko, H., & Bernick, M. (1963). Automatic document classification. Journal of the ACM (JACM), 10(2), 151-162. Borko, H. (1964). The Storage and Retrieval of Educational Information. Journal of Teacher Education, 15(4), 449-452. Borko, H. (1964). Measuring the reliability of subject classification by men and machines. American Documentation, 15(4), 268-273. Borko, H. (1965). The conceptual foundations of information systems. Borko, H. (1968), Information science: What is it?†. Amer. Doc., 19: 3-5. https://doi.org/10.1002/asi.5090190103 Borko, H. (1970). Experiments in book indexing by computer. Information storage and retrieval, 6(1), 5-16. Borko, H. (1985). An introduction to computer-based library systems (Lucy A. Tedd). Education for Information, 3(1), 61.

    Read more →
  • Query rewriting

    Query rewriting

    Query rewriting is a typically automatic transformation that takes a set of database tables, views, and/or queries, usually indices, often gathered data and query statistics, and other metadata, and yields a set of different queries, which produce the same results but execute with better performance (for example, faster, or with lower memory use). Query rewriting can be based on relational algebra or an extension thereof (e.g. multiset relational algebra with sorting, aggregation and three-valued predicates i.e. NULLs as in the case of SQL). The equivalence rules of relational algebra are exploited, in other words, different query structures and orderings can be mathematically proven to yield the same result. For example, filtering on fields A and B, or cross joining R and S can be done in any order, but there can be a performance difference. Multiple operations may be combined, and operation orders may be altered. The result of query rewriting may not be at the same abstraction level or application programming interface (API) as the original set of queries (though often is). For example, the input queries may be in relational algebra or SQL, and the rewritten queries may be closer to the physical representation of the data, e.g. array operations. Query rewriting can also involve materialization of views and other subqueries; operations that may or may not be available to the API user. The query rewriting transformation can be aided by creating indices from which the optimizer can choose (some database systems create their own indexes if deemed useful), mandating the use of specific indices, creating materialized and/or denormalized views, or helping a database system gather statistics on the data and query use, as the optimality depends on patterns in data and typical query usage. Query rewriting may be rule based or optimizer based. Some sources discuss query rewriting as a distinct step prior to optimization, operating at the level of the user accessible algebra API (e.g. SQL). There are other, largely unrelated concepts also named similarly, for example, query rewriting by search engines.

    Read more →
  • Tribute (website)

    Tribute (website)

    Tribute is an American video-sharing website headquartered in Brooklyn. Created in 2014 by Andrew Horn and Rory Petty, the platform lets customers create video montages (called "tributes") for occasions including weddings, birthdays, anniversaries, get well soon, and memorials. Tribute.co allows users to record video messages, request submissions from friends and family, insert photos, add music, and send the resulting video tribute montage to a recipient. == Overview == Tribute's collaborative technology starts with inviting people to contribute via email, SMS or social media. Participants receive a prompt to record a short video via their phone, computer or tablet. The site's video editing software allows users to drag and drop the clips in their desired order without prior video editing experience. == History == When Andrew Horn turned twenty-seven, his girlfriend, Miki Agrawal surprised him with a video montage containing clips of his family and closest friends explaining why they loved him. This resulted in Andrew's idea to create Tribute–a "living eulogy" video-compilation service that he co-founded with software engineer Rory Petty. Founded in 2014, Tribute's activity accelerated in 2020 due to the COVID-19 pandemic, and it had sent over 5 million videos as of December 2021. While social distance restrictions were in effect, the site provided a way for people to connect while in-person celebrations were put on hold. For each video sold, Tribute makes one available to hospitals for free and has partnered with Cleveland Clinic Cancer Center in Ohio, Lurie Children's Hospital in Illinois and CarePoint Health in New Jersey.

    Read more →
  • Microsoft Query

    Microsoft Query

    Microsoft Query is a visual method of creating database queries using examples based on a text string, the name of a document or a list of documents. The QBE system converts the user input into a formal database query using Structured Query Language (SQL) on the backend, allowing the user to perform powerful searches without having to explicitly compose them in SQL, and without even needing to know SQL. It is derived from Moshé M. Zloof's original Query by Example (QBE) implemented in the mid-1970s at IBM's Research Centre in Yorktown, New York. In the context of Microsoft Access, QBE is used for introducing students to database querying, and as a user-friendly database management system for small businesses. Microsoft Excel allows results of QBE queries to be embedded in spreadsheets.

    Read more →
  • VMDS

    VMDS

    VMDS abbreviates the relational database technology called Version Managed Data Store provided by GE Energy as part of its Smallworld technology platform and was designed from the outset to store and analyse the highly complex spatial and topological networks typically used by enterprise utilities such as power distribution and telecommunications. VMDS was originally introduced in 1990 as has been improved and updated over the years. Its current version is 6.0. VMDS has been designed as a spatial database. This gives VMDS a number of distinctive characteristics when compared to conventional attribute only relational databases. == Distributed server processing == VMDS is composed of two parts: a simple, highly scalable data block server called SWMFS (Smallworld Master File Server) and an intelligent client API written in C and Magik. Spatial and attribute data are stored in data blocks that reside in special files called data store files on the server. When the client application requests data it has sufficient intelligence to work out the optimum set of data blocks that are required. This request is then made to SWMFS which returns the data to the client via the network for processing. This approach is particularly efficient and scalable when dealing with spatial and topological data which tends to flow in larger volumes and require more processing then plain attribute data (for example during a map redraw operation). This approach makes VMDS well suited to enterprise deployment that might involve hundreds or even thousands of concurrent clients. == Support for long transactions == Relational databases support short transactions in which changes to data are relatively small and are brief in terms in duration (the maximum period between the start and the end of a transaction is typically a few seconds or less). VMDS supports long transactions in which the volume of data involved in the transaction can be substantial and the duration of the transaction can be significant (days, weeks or even months). These types of transaction are common in advanced network applications used by, for example, power distribution utilities. Due to the time span of a long transaction in this context the amount of change can be significant (not only within the scope of the transaction, but also within the context of the database as a whole). Accordingly, it is likely that the same record might be changed more than once. To cope with this scenario VMDS has inbuilt support for automatically managing such conflicts and allows applications to review changes and accept only those edits that are correct. == Spatial and topological capabilities == As well as conventional relational database features such as attribute querying, join fields, triggers and calculated fields, VMDS has numerous spatial and topological capabilities. This allows spatial data such as points, texts, polylines, polygons and raster data to be stored and analysed. Spatial functions include: find all features within a polygon, calculate the Voronoi polygons of a set of sites and perform a cluster analysis on a set of points. Vector spatial data such as points, polylines and polygons can be given topological attributes that allow complex networks to be modelled. Network analysis engines are provided to answer questions such as find the shortest path between two nodes or how to optimize a delivery route (the travelling salesman problem). A topology engine can be configured with a set of rules that define how topological entities interact with each other when new data is added or existing data edited. == Data abstraction == In VMDS all data is presented to the application as objects. This is different from many relational databases that present the data as rows from a table or query result using say JDBC. VMDS provides a data modelling tool and underlying infrastructure as part of the Smallworld technology platform that allows administrators to associate a table in the database with a Magik exemplar (or class). Magik get and set methods for the Magik exemplar can be automatically generated that expose a table's field (or column). Each VMDS row manifests itself to the application as an instance of a Magik object and is known as an RWO (or real world object). Tables are known as collections in Smallworld parlance. # all_rwos hold all the rwos in the database and is heterogeneous all_rwos << my_application.rwo_set() # valve_collection holds the valve collection valves << all_rwos.select(:collection, {:valve}) number_of_valves << valves.size Queries are built up using predicate objects: # find 'open' valves. open_valves << valves.select(predicate.eq(:operating_status, "open")) number_of_open_valves << open_valves.size _for valve _over open_valves.elements() _loop write(valve.id) _endloop Joins are implemented as methods on the parent RWO. For example, a manager might have several employees who report to him: # get the employee collection. employees << my_application.database.collection(:gis, :employees) # find a manager called 'Steve' and get the first matching element steve << employees.select(predicate.eq(:name, "Steve").and(predicate.eq(:role, "manager")).an_element() # display the names of his direct reports. name is a field (or column) # on the employee collection (or table) _for employee _over steve.direct_reports.elements() _loop write(employee.name) _endloop Performing a transaction: # each key in the hash table corresponds to the name of the field (or column) in # the collection (or table) valve_data << hash_table.new_with( :asset_id, 57648576, :material, "Iron") # get the valve collection directly valve_collection << my_application.database.collection(:gis, :valve) # create an insert transaction to insert a new valve record into the collection a # comment can be provide that describes the transaction transaction << record_transaction.new_insert(valve_collection, valve_data, "Inserted a new valve") transaction.run()

    Read more →
  • NCSA Brown Dog

    NCSA Brown Dog

    NCSA Brown Dog is a research project to develop a method for easily accessing historic research data stored in order to maintain the long-term viability of large bodies of scientific research. It is supported by the National Center for Supercomputing Applications (NCSA) that is funded by the National Science Foundation (NSF). == History == Brown Dog is part of the DataNet partners program funded by NSF in 2008. DataNet was conceived to address the increasingly digital and data-intensive nature of science, engineering and education. Brown Dog is part of a follow-on effort called Data Infrastructure Building Blocks (DIBBs), focused on building software to support DataNet. The project was proposed by researchers at NCSA and the University of Illinois Urbana-Champaign as well as researchers from Boston University and the University of North Carolina at Chapel Hill. == Unstructured, uncurated, long tail data == Much scientific data is smaller, unstructured and uncurated and thus not easily shared. Such data is sometimes referred to as "long tail" data. This borrows a term from statistics and refers to the tail of the distribution of project sizes. The majority of smaller projects lack the resources to properly steward the data they produce. This so-called "long tail" data, both past and present, has the potential to inform future research in many study areas. Much of this data has become inaccessible due to obsolete software and file formats. The resulting impossibility of reviewing data from older research disrupts the overall scientific research project. == Approach == Brown Dog describes itself as the "super mutt" of software (thus the name "Brown Dog"), serving as a low-level data infrastructure to interface digital data content across the internet. Its approach is to use every possible source of automated help (i.e., software) in existence in a robust and provenance-preserving manner to create a service that can deal with as much of this data as possible. The project sees the broader impact of its work in its potential to serve the general public as a sort of "DNS for data", with the goal of making all data and all file formats as accessible as webpages are today. == Technology == Brown Dog seeks to address problems involving the use of uncurated and unstructured data collections through the development of two services: the Data Access Proxy (DAP) to aid in the conversion of file formats and the Data Tilling Services (DTS) for the automatic extraction of metadata from file contents. Once developed, researchers and general public users will be able to download browser plugins and other tools from the Brown Dog tool catalog. === Data Tilling Service === Data Tilling Service (DTS) will allow users to search data collections using an existing file to discover other similar files in a collection. A DTS search field will be appended to configured browsers where example files can be dropped. This tells DTS to search all the files under a given URL for files similar to the dropped file. For example, while browsing an online image collection, a user could drop an image of three people into the search field, and the DTS would return all images in the collection that also contain three people. If DTS encounters a foreign file format, it will utilize DAP to make the file accessible. DTS also indexes the data and extract and appends metadata to files and collections enabling users to gain some sense of the type of data they are encountering. This service runs on port 9443. === Data Access Proxy === Data Access Proxy (DAP) allows users to access data files that would otherwise be unreadable. Similar to an internet gateway or Domain Name Service, the DAP configuration would be entered into a user's machine and browser settings. Data requests over HTTP would first be examined by DAP to determine if the native file format is readable on the client device. If not, DAP converts the file into the best available format readable by the client machine. Alternatively, the user could specify the desired format themselves. This service runs on port 8184. == Use cases == Brown Dog targets three use cases proposed by groups within the EarthCube research communities. Developers and researchers from these communities will work together on use cases that span geoscience, engineering, biology and social science. === Long tail vegetation data in ecology and global change biology === This use case is led by Michael Dietze, Boston University Data on the abundance, species composition, and size structure of vegetation is critically important for a wide array of sub-disciplines in ecology, conservation, natural resource management, and global change biology. However, addressing many of the pressing questions in these disciplines will require that terrestrial biosphere and hydrologic models are able to assimilate the large amount of long-tail data that exists but is largely inaccessible. The Brown Dog team in cooperation with researches from Dietze's lab will facilitate the capture of a huge body of smaller research-oriented vegetation data sets collected over many decades and historical vegetation data embedded in Public Land Survey data dating back to 1785. This data will be used as initial conditions for models, to make sense of other large data sets and for model calibration and validation. === Designing green infrastructure considering storm water and human requirements === This use case is led by Barbara Minsker], University of Illinois at Urbana-Champaign]; William Sullivan, University of Illinois at Urbana-Champaign; Arthur Schmidt, University of Illinois at Urbana-Champaign. This case study involves developing novel green infrastructure design criteria and models that integrate requirements for storm water management and ecosystem and human health and well being. To address the scientific and social problems associated with the design of green spaces, data accessibility and availability is a major challenge. This study will focus on identified areas of the Green Healthy Neighborhood Planning region within the City of Chicago where existing local sewer performance is most deficient and where changes in impervious area through green infrastructure would be beneficial to under served neighborhoods. Brown Dog will be used to extract long-tail experimental data on human landscape preferences and health impacts. This data will be used to develop a human health impacts model that will then be linked together with a terrestrial biosphere model and a storm water model using Brown Dog technology. === Development and application for critical zone studies === This use case is led by Praveen Kumar, University of Illinois at Urbana-Champaign Critical Zone (CZ) is the "skin" of the earth that extends from the treetops to the bedrock that is created by life processes working at scales from microbes to biomes. The Critical Zone supports all terrestrial living systems. Its upper part is the bio-mantle. This is where terrestrial biota live, reproduce, use and expend energy, and where their wastes and remains accumulate and decompose. It encompasses the soil, which acts as a geomembrane through which water and solutes, energy, gases, solids, and organisms interact with the atmosphere, biosphere, hydrosphere, and lithosphere. A variety of drivers affect this bio-dynamic zone, ranging from climate and deforestation to agriculture, grazing and human development. Understanding and predicting these effects is central to managing and sustaining vital ecosystem services such as soil fertility, water purification, and production of food resources, and, at larger scales, global carbon cycling and carbon sequestration. The CZ provides a unifying framework for integrating terrestrial surface and near-surface environments, and reflects an intricate web of biological and chemical processes and human impacts occurring at vastly different temporal and spatial scales. The nature of these data create significant challenges for inter-disciplinary studies of the CZ because integration of the variety and number of data products and models has been a barrier. On the other hand, CZ data provides an excellent opportunity for defining, testing and implementing Brown Dog technologies. In this context "unstructured" data is viewed broadly as consisting of a collection of heterogeneous data with formats that reflect temporal and disciplinary legacies, data from emerging low cost open hardware based sensors and embedded sensor networks that lack well defined metadata and sensor characteristics, as well as data that are available as maps, images and text. == NSF Award == CIF21 DIBBs: Brown Dog was awarded in the winter of 2013 with a start date of October 1, 2013. Estimated expiration date is September 30, 2018. The award amount was $10,519,716.00, the largest DIBB award. The principal investigator is Kenton McHenry of NCSA at the University of Illinois at Urbana-Champaign. Coleaders are Jong Lee NCSA/UIU

    Read more →
  • Empirical risk minimization

    Empirical risk minimization

    In statistical learning theory, the principle of empirical risk minimization defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large numbers; more specifically, we cannot know exactly how well a predictive algorithm will work in practice (i.e. the "true risk") because we do not know the true distribution of the data, but we can instead estimate and optimize the performance of the algorithm on a known set of training data. The performance over the known set of training data is referred to as the "empirical risk". == Background == The following situation is a general setting of many supervised learning problems. There are two spaces of objects X {\displaystyle X} and Y {\displaystyle Y} and we would like to learn a function h : X → Y {\displaystyle \ h:X\to Y} (often called hypothesis) which outputs an object y ∈ Y {\displaystyle y\in Y} , given x ∈ X {\displaystyle x\in X} . To do so, there is a training set of n {\displaystyle n} examples ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} where x i ∈ X {\displaystyle x_{i}\in X} is an input and y i ∈ Y {\displaystyle y_{i}\in Y} is the corresponding response that is desired from h ( x i ) {\displaystyle h(x_{i})} . To put it more formally, assuming that there is a joint probability distribution P ( x , y ) {\displaystyle P(x,y)} over X {\displaystyle X} and Y {\displaystyle Y} , and that the training set consists of n {\displaystyle n} instances ( x 1 , y 1 ) , … , ( x n , y n ) {\displaystyle \ (x_{1},y_{1}),\ldots ,(x_{n},y_{n})} drawn i.i.d. from P ( x , y ) {\displaystyle P(x,y)} . The assumption of a joint probability distribution allows for the modelling of uncertainty in predictions (e.g. from noise in data) because y {\displaystyle y} is not a deterministic function of x {\displaystyle x} , but rather a random variable with conditional distribution P ( y | x ) {\displaystyle P(y|x)} for a fixed x {\displaystyle x} . It is also assumed that there is a non-negative real-valued loss function L ( y ^ , y ) {\displaystyle L({\hat {y}},y)} which measures how different the prediction y ^ {\displaystyle {\hat {y}}} of a hypothesis is from the true outcome y {\displaystyle y} . For classification tasks, these loss functions can be scoring rules. The risk associated with hypothesis h ( x ) {\displaystyle h(x)} is then defined as the expectation of the loss function: R ( h ) = E [ L ( h ( x ) , y ) ] = ∫ L ( h ( x ) , y ) d P ( x , y ) . {\displaystyle R(h)=\mathbf {E} [L(h(x),y)]=\int L(h(x),y)\,dP(x,y).} A loss function commonly used in theory is the 0-1 loss function: L ( y ^ , y ) = { 1 if y ^ ≠ y 0 if y ^ = y {\displaystyle L({\hat {y}},y)={\begin{cases}1&{\mbox{ if }}\quad {\hat {y}}\neq y\\0&{\mbox{ if }}\quad {\hat {y}}=y\end{cases}}} . The ultimate goal of a learning algorithm is to find a hypothesis h ∗ {\displaystyle h^{}} among a fixed class of functions H {\displaystyle {\mathcal {H}}} for which the risk R ( h ) {\displaystyle R(h)} is minimal: h ∗ = a r g m i n h ∈ H R ( h ) . {\displaystyle h^{}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,{R(h)}.} For classification problems, the Bayes classifier is defined to be the classifier minimizing the risk defined with the 0–1 loss function. == Formal definition == In general, the risk R ( h ) {\displaystyle R(h)} cannot be computed because the distribution P ( x , y ) {\displaystyle P(x,y)} is unknown to the learning algorithm. However, given a sample of iid training data points, we can compute an estimate, called the empirical risk, by computing the average of the loss function over the training set; more formally, computing the expectation with respect to the empirical measure: R emp ( h ) = 1 n ∑ i = 1 n L ( h ( x i ) , y i ) . {\displaystyle \!R_{\text{emp}}(h)={\frac {1}{n}}\sum _{i=1}^{n}L(h(x_{i}),y_{i}).} The empirical risk minimization principle states that the learning algorithm should choose a hypothesis h ^ {\displaystyle {\hat {h}}} which minimizes the empirical risk over the hypothesis class H {\displaystyle {\mathcal {H}}} : h ^ = a r g m i n h ∈ H R emp ( h ) . {\displaystyle {\hat {h}}={\underset {h\in {\mathcal {H}}}{\operatorname {arg\,min} }}\,R_{\text{emp}}(h).} Thus, the learning algorithm defined by the empirical risk minimization principle consists in solving the above optimization problem. == Properties == Guarantees for the performance of empirical risk minimization depend strongly on the function class selected as well as the distributional assumptions made. In general, distribution-free methods are too coarse, and do not lead to practical bounds. However, they are still useful in deriving asymptotic properties of learning algorithms, such as consistency. In particular, distribution-free bounds on the performance of empirical risk minimization given a fixed function class can be derived using bounds on the VC complexity of the function class. For simplicity, considering the case of binary classification tasks, it is possible to bound the probability of the selected classifier, ϕ n {\displaystyle \phi _{n}} being much worse than the best possible classifier ϕ ∗ {\displaystyle \phi ^{}} . Consider the risk L {\displaystyle L} defined over the hypothesis class C {\displaystyle {\mathcal {C}}} with growth function S ( C , n ) {\displaystyle {\mathcal {S}}({\mathcal {C}},n)} given a dataset of size n {\displaystyle n} . Then, for every ϵ > 0 {\displaystyle \epsilon >0} : P ( L ( ϕ n ) − L ( ϕ ∗ ) > ϵ ) ≤ 8 S ( C , n ) exp ⁡ { − n ϵ 2 / 32 } {\displaystyle \mathbb {P} \left(L(\phi _{n})-L(\phi ^{})>\epsilon \right)\leq {\mathcal {8}}S({\mathcal {C}},n)\exp\{-n\epsilon ^{2}/32\}} Similar results hold for regression tasks. These results are often based on uniform laws of large numbers, which control the deviation of the empirical risk from the true risk, uniformly over the hypothesis class. === Impossibility results === It is also possible to show lower bounds on algorithm performance if no distributional assumptions are made. This is sometimes referred to as the No free lunch theorem. Even though a specific learning algorithm may provide the asymptotically optimal performance for any distribution, the finite sample performance is always poor for at least one data distribution. This means that no classifier can improve on the error for a given sample size for all distributions. Specifically, let ϵ > 0 {\displaystyle \epsilon >0} and consider a sample size n {\displaystyle n} and classification rule ϕ n {\displaystyle \phi _{n}} , there exists a distribution of ( X , Y ) {\displaystyle (X,Y)} with risk L ∗ = 0 {\displaystyle L^{}=0} (meaning that perfect prediction is possible) such that: E L n ≥ 1 / 2 − ϵ . {\displaystyle \mathbb {E} L_{n}\geq 1/2-\epsilon .} It is further possible to show that the convergence rate of a learning algorithm is poor for some distributions. Specifically, given a sequence of decreasing positive numbers a i {\displaystyle a_{i}} converging to zero, it is possible to find a distribution such that: E L n ≥ a i {\displaystyle \mathbb {E} L_{n}\geq a_{i}} for all n {\displaystyle n} . This result shows that universally good classification rules do not exist, in the sense that the rule must be low quality for at least one distribution. === Computational complexity === Empirical risk minimization for a classification problem with a 0-1 loss function is known to be an NP-hard problem even for a relatively simple class of functions such as linear classifiers. Nevertheless, it can be solved efficiently when the minimal empirical risk is zero, i.e., data is linearly separable. In practice, machine learning algorithms cope with this issue either by employing a convex approximation to the 0–1 loss function (like hinge loss for SVM), which is easier to optimize, or by imposing assumptions on the distribution P ( x , y ) {\displaystyle P(x,y)} (and thus stop being agnostic learning algorithms to which the above result applies). In the case of convexification, Zhang's lemma majors the excess risk of the original problem using the excess risk of the convexified problem. Minimizing the latter using convex optimization also allow to control the former. == Tilted empirical risk minimization == Tilted empirical risk minimization is a machine learning technique used to modify standard loss functions like squared error, by introducing a tilt parameter. This parameter dynamically adjusts the weight of data points during training, allowing the algorithm to focus on specific regions or characteristics of the data distribution. Tilted empirical risk minimization is particularly useful in scenarios with imbalanced data or when there is a need to emphasize errors in certain parts of the prediction space.

    Read more →
  • Penril

    Penril

    Penril DataComm Networks, Inc. was a computer telecommunications hardware company that made some acquisitions and was eventually split into two parts: one was acquired by Bay Networks and the other was a newly formed company named Access Beyond. The focus of both company's products was end-to-end data transfer. By the mid-1990s, with the popularization of the internet, this was no longer of wide interest. == History == Penril, whose earnings reports and other financials were followed by The New York Times in the 1990s, made several acquisitions but also grew internally. Following its Datability acquisition it renamed itself Penril Datability Networks. By the time the 1968-founded Penril was acquired by Bay their name was Penril DataComm Networks. The company, which as of 1985 "had made 14 acquisitions in 12 years," also had done extensive work regarding quality control, and leveraged their product line by what The Washington Post called clever packaging: "software, cables, instructions and telephone support" sold to those less technically skilled as "Network in a Box." == Datability == Datability Software Systems Inc. was the initial name of what by 1991 became 'Datability, Inc.', "a manufacturer of hardware that links computer networks." The 1977-founded firm began as a software consulting company, especially in the area of databases. To speed up project development they built a program generator, which they marketed as Control 10/20 (targeted at users of Digital Equipment Corporation's DECsystem-10 and DECSYSTEM-20). After trying their hand at time-sharing they built hardware to enhance bridging these computers to DEC's VAX product line. In particular they focused on Digital's LAT protocol, selling "boxes" that reimplemented the protocol, at a lower price than DEC's. They later expanded into other areas of telecommunications hardware The firm relocated to a larger manufacturing plant in 1991 and was acquired by Penril in 1993. == Access Beyond == Access Beyond was initially housed by Penril, from which it was spun off. A securities analyst noted that Access began operations with no debt. They subsequently merged with Hayes Corporation. Some of the funds brought to the merger came from a sale by Penril of two of its divisions, each bringing about $4 million. == Ron Howard == Ron Howard, founder of Datability, became part of Penril when the latter acquired the former, and was CEO of Access Beyond when it was spun off by Penril. Access merged with Hayes Microcomputer Products and was renamed Hayes Corp, at which time Howard became executive VP of business development and corporate vice chairman of Hayes. == People == In the matter of hiring immigrants, in an industry where recent arrivals came from a culture of six day work weeks, and subcontracting was then common, these assembly line workers at Penril comprised about 25%, compared to double in other firms. Placement was overseen by government agencies. == Controversy == Penril had a joint development agreement, beginning in 1990, with a Standard Microsystems Corporation (SMSC) subsidiary. A dispute arose, and the matter was brought to court. Penril was awarded $3.5 million in 1996.

    Read more →
  • Information strategist

    Information strategist

    An information strategist analyses the information flow within an organisation and directs its information resources to better serve the organisation's strategic goals. They work with information technology or within a corporate library to direct high quality information from a variety of sources to users, based upon their profiles and needs. In warfare, information strategists not only seek to improve information flows for their own side but also try to disrupt the information flows of the enemy in order to demoralize and deceive them.

    Read more →