AI Data Training Jobs

AI Data Training Jobs — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cross-language information retrieval

    Cross-language information retrieval

    Cross-language information retrieval (CLIR) is a subfield of information retrieval dealing with retrieving information written in a language different from the language of the user's query. The term "cross-language information retrieval" has many synonyms, of which the following are perhaps the most frequent: cross-lingual information retrieval, translingual information retrieval, multilingual information retrieval. The term "multilingual information retrieval" refers more generally both to technology for retrieval of multilingual collections and to technology which has been moved to handle material in one language to another. The term Multilingual Information Retrieval (MLIR) involves the study of systems that accept queries for information in various languages and return objects (text, and other media) of various languages, translated into the user's language. Cross-language information retrieval refers more specifically to the use case where users formulate their information need in one language and the system retrieves relevant documents in another. To do so, most CLIR systems use various translation techniques. CLIR techniques can be classified into different categories based on different translation resources: Dictionary-based CLIR techniques Parallel corpora based CLIR techniques Comparable corpora based CLIR techniques Machine translator based CLIR techniques CLIR systems have improved so much that the most accurate multi-lingual and cross-lingual adhoc information retrieval systems today are nearly as effective as monolingual systems. Other related information access tasks, such as media monitoring, information filtering and routing, sentiment analysis, and information extraction require more sophisticated models and typically more processing and analysis of the information items of interest. Much of that processing needs to be aware of the specifics of the target languages it is deployed in. Mostly, the various mechanisms of variation in human language pose coverage challenges for information retrieval systems: texts in a collection may treat a topic of interest but use terms or expressions which do not match the expression of information need given by the user. This can be true even in a mono-lingual case, but this is especially true in cross-lingual information retrieval, where users may know the target language only to some extent. The benefits of CLIR technology for users with poor to moderate competence in the target language has been found to be greater than for those who are fluent. Specific technologies in place for CLIR services include morphological analysis to handle inflection, decompounding or compound splitting to handle compound terms, and translations mechanisms to translate a query from one language to another. The first workshop on CLIR was held in Zürich during the SIGIR-96 conference. Workshops have been held yearly since 2000 at the meetings of the Cross Language Evaluation Forum (CLEF). Researchers also convene at the annual Text Retrieval Conference (TREC) to discuss their findings regarding different systems and methods of information retrieval, and the conference has served as a point of reference for the CLIR subfield. Early CLIR experiments were conducted at TREC-6, held at the National Institute of Standards and Technology (NIST) on November 19–21, 1997. Google Search had a cross-language search feature that was removed in 2013.

    Read more →
  • Clara.io

    Clara.io

    Clara.io is web-based freemium 3D computer graphics software developed by Exocortex, a Canadian software company. The free or "Basic" component of their freemium offering, however, places severe restrictions, such as on saving models and importing texture maps, which are undisclosed in the company's own descriptions of their plans.vf TMN == History == Clara.io was announced in July 2013, and first presented as part of the official SIGGRAPH 2013 program later that month. By November 2013, when the open beta period started, Clara.io had 14,000 registered users. Clara.io claimed to have 26,000 registered users in January 2014, which grew to 85,000 by December 2014. Clara.io was permanently shut down on December 31, 2022, but the site is currently still partially functional to logged-in users. == Features == Polygonal modeling Constructive solid geometry Key frame animation Skeletal animation Hierarchical scene graph Texture mapping Photorealistic rendering (streaming cloud rendering using V-Ray Cloud) Scene publishing via HTML iframe embedding FBX, Collada, OBJ, STL and Three.js import/export Collaborative real-time editing Revision control (versioning & history) Scripting, Plugins & REST APIs 3D model library Unlisted and Private scenes (paid subscriptions only). == Technology == Clara.io is developed using HTML5, JavaScript, WebGL and Three.js. Clara.io does not rely on any browser plugins and thus runs on any platform that has a modern standards compliant browser. == Screenshots ==

    Read more →
  • Apache Drill

    Apache Drill

    Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Drill supports a variety of NoSQL databases and file systems, including Alluxio, HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, if Drill and the datastore are on the same nodes. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016. == Features == One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Schema-free JSON document model similar to MongoDB and Elasticsearch, without requiring a formal schema to be declared Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs Extremely user and developer friendly Pluggable architecture enables connectivity to multiple datastores Version 1.9 added dynamic user-defined functions Version 1.11 added cryptographic-related functions and PCAP file format support == Back-end support == Drill is primarily focused on non-relational datastores, including Apache Hadoop text files, NoSQL, and cloud storage. A notable feature also includes in situ querying of local JSON and Apache Parquet files. Some additional datastores that it supports include: All Hadoop distributions (HDFS API 2.3+), including Apache Hadoop, MapR, CDH and Amazon EMR NoSQL: MongoDB, Apache HBase, Apache Cassandra Online Analytical Processing: Apache Kudu, Apache Druid, OpenTSDB Cloud storage: Amazon S3, Google Cloud Storage, Azure Blob Storage, Swift, IBM Cloud Object Storage Diverse data formats, including Apache Avro, Apache Parquet and JSON RDBMs storage plugins (Using JDBC to connect to MySQL, PostgreSQL, and others) A new datastore can be added by developing a storage plugin. Drill's "schema-free" JSON data model enables it to query non-relational datastores in-situ . == Front-end support == Drill itself can be queried via JDBC, ODBC, or REST through a variety of methods and languages including Python and Java. The default install includes a web interface allowing end-users to execute ANSI SQL directly and export data tables as CSV files without any programming. The dashboard library, Apache Superset, is particularly well suited for visualization of data queried with Drill.

    Read more →
  • Containerization (computing)

    Containerization (computing)

    In software engineering, containerization is operating-system-level virtualization or application-level virtualization over multiple resources so that software applications can run in isolated user spaces called containers in any cloud or non-cloud environment, regardless of type or vendor. The term "container" has different meanings in different contexts, and it is important to ensure that the intended definition aligns with the audience's understanding. == Usage == Each container is basically a fully functional and portable cloud or non-cloud computing environment surrounding the application and keeping it independent of other environments running in parallel. Individually, each container simulates a different software application and runs isolated processes by bundling related configuration files, libraries and dependencies. But, collectively, multiple containers share a common operating system kernel (OS). In recent times, containerization technology has been widely adopted by cloud computing platforms like Amazon Web Services, Microsoft Azure, Google Cloud Platform, and IBM Cloud. Containerization has also been pursued by the U.S. Department of Defense as a way of more rapidly developing and fielding software updates, with first application in its F-22 air superiority fighter. == History == The concept of containerization in computing originated from early operating system–level isolation mechanisms. One of the earliest implementations was the chroot system call introduced in Version 7 Unix in 1979, which changed the apparent root directory for a process and its children, providing a basic form of filesystem isolation. In the early 2000s, more advanced forms of operating system–level virtualization were developed. FreeBSD introduced "jails" in 2000, which extended isolation by restricting processes to a subset of system resources. Around the same time, Solaris introduced "zones" (also known as Solaris Containers), providing similar capabilities with resource management and isolation features. Linux later incorporated comparable functionality through kernel features such as namespaces and control groups (cgroups), which enabled isolation of process IDs, network stacks, filesystems, and resource allocation. These features formed the foundation for Linux Containers (LXC), which provided a userspace interface for managing containers. The widespread adoption of containerization accelerated with the release of Docker in 2013, which introduced a standardized format for packaging applications and their dependencies, along with tooling for image distribution and container management. == Types of containers == OS containers Application containers == Security issues == Because of the shared OS, security threats can affect the whole containerized system. In containerized environments, security scanners generally protect the OS, but not the application containers, which adds unwanted vulnerability. == Container management, orchestration, clustering == Container orchestration or container management is mostly used in the context of application containers. Implementations providing such orchestration include Kubernetes and Docker swarm. == Container cluster management == Container clusters need to be managed. This includes functionality to create a cluster, to upgrade the software or repair it, balance the load between existing instances, scale by starting or stopping instances to adapt to the number of users, to log activities and monitor produced logs or the application itself by querying sensors. Open-source implementations of such software include OKD and Rancher. Quite a number of companies provide container cluster management as a managed service, like Alibaba, Amazon, Google, and Microsoft.

    Read more →
  • Content Security Policy

    Content Security Policy

    Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features. == Status == The standard, originally named Content Restrictions, was proposed by Robert Hansen in 2004, first implemented in Firefox 4 and quickly picked up by other browsers. Version 1 of the standard was published in 2012 as W3C candidate recommendation and quickly with further versions (Level 2) published in 2014. As of 2023, the draft of Level 3 is being developed with the new features being quickly adopted by the web browsers. The following header names are in use as part of experimental CSP implementations: Content-Security-Policy – standard header name proposed by the W3C document. Google Chrome supports this as of version 25. Firefox supports this as of version 23, released on 6 August 2013. WebKit supports this as of version 528 (nightly build). Chromium-based Microsoft Edge support is similar to Chrome's. X-WebKit-CSP – deprecated, experimental header introduced into Google Chrome, Safari and other WebKit-based web browsers in 2011. X-Content-Security-Policy – deprecated, experimental header introduced in Gecko 2 based browsers (Firefox 4 to Firefox 22, Thunderbird 3.3, SeaMonkey 2.1). A website can declare multiple CSP headers, also mixing enforcement and report-only ones. Each header will be processed separately by the browser. CSP can also be delivered within the HTML code using a meta tag, although in this case its effectiveness will be limited. Internet Explorer 10 and Internet Explorer 11 also support CSP, but only sandbox directive, using the experimental X-Content-Security-Policy header. A number of web application frameworks support CSP, for example AngularJS (natively) and Django (middleware). Instructions for Ruby on Rails have been posted by GitHub. Web framework support is however only required if the CSP contents somehow depend on the web application's state—such as usage of the nonce origin. Otherwise, the CSP is rather static and can be delivered from web application tiers above the application, for example on load balancer or web server. === Bypasses === In December 2015 and December 2016, a few methods of bypassing 'nonce' allowlisting origins were published. In January 2016, another method was published, which leverages server-wide CSP allowlisting to exploit old and vulnerable versions of JavaScript libraries hosted at the same server (frequent case with CDN servers). In May 2017 one more method was published to bypass CSP using web application frameworks code. == Mode of operation == If the Content-Security-Policy header is present in the server response, a compliant client enforces the declarative allowlist policy. One example goal of a policy is a stricter execution mode for JavaScript in order to prevent certain cross-site scripting attacks. In practice this means that a number of features are disabled by default: Inline JavaScript code