Rating Timelines & Score Report Options

Through your CELPIP Account, you can access your score results online. You can also use your CELPIP Account to order priority shipping, order printed copies of your score report, and order additional copies of your score report:

 

Rating Type/RequestDescription
Official Score Report
  • Scores are available online in 4-5 calendar days.
  • If you wish to submit your scores to an organization (i.e. Immigration, Refugees, and Citizenship Canada), please download a PDF copy of your CELPIP Official Score Report from your CELPIP account

Order a Hardcopy Official Score Report

(Online – Sign in to CELPIP Account )

You can order a hardcopy Official Score Reports for a fee of $20.00 CAD each

  • Hardcopy Official Score Report can be requested within 2 years from the test date
  • The length of time that your scores are considered valid by various institutions, however, is determined by their individual policies. You can obtain this information from these institutions.
  • Once requested, a print copy of your CELPIP Official Score Report will be sent to your registered address via Canada Post’s standard delivery service
  • Priority Shipping is available for an additional fee and a tracking number will be provided

Re-evaluation Request
(Online – Sign in to CELPIP Account )

  • You can apply for a re-evaluation of some or all components of your CELPIP-General Test within six months of the test date.
  • You must pay a re-evaluation fee at the time of your application that depends on which components of the test you would like re-evaluated.
  • If the CELPIP-General level changes for any component that has been re-evaluated, the re-evaluation fee for this component will be refunded. Please note that there is a limit of one re-evaluation for any particular component of the test.
  • Re-evaluation requests are final sale, and cannot be cancelled once the request has been submitted.
  • Test takers who apply for a re-evaluation of their test results will be notified of the results of the re-evaluation in approximately one to two weeks of submission of their application and payment of the re-evaluation fee.
  • Please note that requesting a re-evaluation of the Listening and Reading components is unlikely to result in a change in your scores as they are computer rated.

Understanding Your Test Scores

Each component of the CELPIP-General Test and the CELPIP-General LS Test is given a CELPIP level.

Below is a chart of each CELPIP level and its corresponding description. Since the CELPIP test scores have been calibrated against the Canadian Language Benchmark (CLB) levels, we have included the CLB level equivalencies for your information.

Test Level DescriptorCELPIP LevelCLB Level
Advanced proficiency in workplace and community contexts1212
Advanced proficiency in workplace and community contexts1111
Highly effective proficiency in workplace and community contexts1010
Effective proficiency in workplace and community contexts99
Good proficiency in workplace and community contexts88
Adequate proficiency in workplace and community contexts77
Developing proficiency in workplace and community contexts66
Acquiring proficiency in workplace and community contexts55
Adequate proficiency for daily life activities44
Some proficiency in limited contexts33
Minimal proficiency or insufficient information to assessM0, 1, 2
Not Administered: test taker did not receive this test componentNA/

Performance Standards

CategoriesFactors
1. Content/Coherence:
  • Number of ideas
  • Quality of ideas
  • Organization of ideas
  • Examples and supporting details
2. Vocabulary:
  • Word choice
  • Precision and accuracy
  • Range of words and phrases
  • Suitable use of words and phrases
3. Listenability:
  • Rhythm, pronunciation, and intonation
  • Pauses, interjections, and self-correction
  • Grammar and sentence structure
  • Variety of sentence structure
4. Task Fulfillment:
  • Relevance
  • Completeness
  • Tone
  • Length
CategoriesFactors
1. Content/Coherence:
  • Number of ideas
  • Quality of ideas
  • Organization of ideas
  • Examples and supporting details
2. Vocabulary:
  • Word choice
  • Suitable use of words and phrases
  • Range of words and phrases
  • Precision and accuracy
3. Readability:
  • Format and paragraphing
  • Connectors and transitions
  • Grammar and sentence structure
  • Spelling and punctuation
4. Task Fulfillment:
  • Relevance
  • Completeness
  • Tone
  • Word count
CELPIP LevelListening Score /38
10-12

35-38

9

33-35

8

30-33

7

27-31

6

22-28

5

17-23

4

11-18

3

7-12

M

0-7

DISCLAIMER: This example chart shows how scores in the Listening Test approximately correspond to CELPIP Levels. Since questions may have different levels of difficulty and may therefore be equated differently, the raw score required for a certain level may vary slightly from one test to another.

CELPIP LevelReading Score /38
10-12

33-38

9

31-33

8

28-31

7

24-28

6

19-25

5

15-20

4

10-16

3

8-11

M

0-7

DISCLAIMER: This example chart shows how scores in the Reading Test approximately correspond to CELPIP Levels. Since questions may have different levels of difficulty and may therefore be equated differently, the raw score required for a certain level may vary slightly from one test to another.

All CELPIP Reading and Listening questions are in the multiple choice or similar format. All the answers to CELPIP Reading and Listening questions are scored dichotomously: a response is either correct or incorrect. Questions that are left blank are scored as incorrect. All the scoring is done by computer.

The writing and speaking components of the CELPIP-General Test are scored by qualified raters trained to apply consistent criteria to assess test taker performances based on standard scoring rubrics. Raters receive ongoing training and regular monitoring. Paragon uses rater agreement statistics to determine the quality of ratings; for a given test taker, a rater agrees with the other raters of this test taker if their rating is sufficiently close to that of the other raters (i.e., consensus).

English Proficiency
  • Native speaker of English
  • – or-
  • Non-native speaking of English with a CLB 11/12 English language proficiency
Education
  • A minimum of an undergraduate degree
Teaching & Assessment Experience
  • ESL teaching certification recognized by TESL Canada
  • – or –
  • Graduate training in language education or in linguistics
  • – or –
  • A minimum of 3 years of experience in ESL teaching or language education
  • – or –
  • A minimum of 3 years of experience in a linguistics related field
Residency
  • Resident in Canada at time of scoring

Raters receive ongoing training to ensure that the scoring criteria are consistently and systematically applied by all raters, and to minimize potential bias introduced by human judgment.

  1. Initial Rater Training

All raters attend an initial training program to guide them through Paragon’s rating approach. After completing a training manual, exercises, and rating samples, trainees engage in a certification process during which they rate 3-6 certification sets. In order to certify, trainees must achieve a minimum 80% agreement with the official score assigned to each performance in at least three consecutive sets. Only certified raters can start operational rating.

  1. Operational Rater Training

To maintain a shared perspective on relevant rating principles and criteria, all operational raters receive ongoing in-service training and monitoring, including:

  • Weekly feedback on their agreement with other raters
  • Weekly sample performances rated by expert raters
  • Biweekly in-depth training materials in the form of communications hosted online, including a range of sample performances rated and justified by expert raters
  • Detailed notes from rater seminars, in which challenging responses are discussed and rated by expert raters.
  1. Rater Monitoring

Rater performance analysis is conducted monthly to monitor the reliability of the rater pool and to identify raters who have unsatisfactory rater agreement. Underperforming raters receive personalized feedback on rating samples that demonstrate a significant discrepancy between their ratings and benchmark ratings Additional samples are provided upon request by a rater. Once identified as underperforming, a rater must demonstrate improvement within 8 weeks. If an underperforming rater does not demonstrate an improvement to meet Paragon’s rating standards within that period, Paragon may terminate their rating contract.

All tests are randomly assigned to raters by an online system. Test taker anonymity is maintained at all times. Each test taker’s performance, i.e. a test taker’s responses to all tasks in the component, is assessed by multiple raters. Each CELPIP speaking performance is rated by a minimum of three speaking raters, and each CELPIP writing performance is rated by a minimum of four writing raters. Raters work independently of one another, and have no knowledge of the ratings assigned by other raters.

  1. Rating criteria

The rating dimensions that have been developed for the writing and speaking component are listed above on this page in the Performance Standards section:

Speaking: Content/Coherence, Vocabulary, Listenability, and Task Fulfillment

Writing: Content/Coherence, Vocabulary, Readability, and Task Fulfillment

Each dimension is divided into five performance levels. Performance descriptors are provided for each level in each dimension. Raters assign a level in each dimension by identifying tangible evidence in the test taker’s performance that matches the descriptors in the rating scale.

  1. Benchmarking

When the ratings of a test taker’s performance are complete, they are inspected for agreement. If the ratings are in disagreement, a benchmark rater is automatically assigned to assess the performance. All benchmark raters are experienced raters who have demonstrated consistent accuracy and reliability in rating. Benchmark raters have no knowledge of the initial ratings.

  1. How is the final score determined?

The Speaking and Writing component scores are derived from the dimensional ratings assigned by the raters. These scores are then transformed into a CELPIP level. The transformation rules have been established by English language experts who participated in a standard setting exercise. Standard setting is an extensive, research-based process. Language experts work with testing professionals to identify what language learners need to be able to do at each performance level, such as CLB 8. The experts then analyze the test in detail and determine what level of performance a test taker needs to demonstrate for each CELPIP level. This process has established a defensible link between each Speaking and Writing component score and its corresponding CELPIP level.

Test Scoring FAQs

You can access your CELPIP Test scores online through your CELPIP Account in 4-5 calendar days after your test date. Please note that business days do not include weekends or holidays.

 

You will receive an email notification once your scores are available. Your CELPIP Official Score Report will be available for download as a PDF once your scores have been posted. In order to download your Score Report, please sign in to your CELPIP Account and click on “Check Scores” button. For information on obtaining test results, click here.

 

Your CELPIP Test scores can be accessed and viewed online in your CELPIP Account for a period of 2 years, from your test date.

 

See full list of FAQs here

In rare instances, your CELPIP scores may be delayed. This could be for one of the following reasons: 

  • We determined that your test required additional review before scores could be released. We may contact you for more information.
  • You reported an issue to Paragon Testing Enterprises within 48 hours of your test. We may hold the scores if we are still investigating.
  • We require additional confirmation of your identify, or we are investigating possible testing irregularities.

If it’s been more than 5 calendar days since your test and you have not received an email from the CELPIP Office, please feel free to contact us directly for more information.

If it’s been 5 calendar days or less since your test, your scores should be posted soon! You will receive an email confirmation once your scores are available online.

For security reasons, there are many different test forms (versions of the test). Different test forms are administered to different test takers even if they take the test at the same time. Each test form will have some unique questions and possibly some questions that are shared with other forms. Paragon administers multiple forms to minimize the chance of somebody having access to the questions before their testing time, thereby gaining an unfair or undeserved score on the test.

New items are constantly being written. Before they can be used as scored items, they are pre-tested to ensure that they are equivalent in quality to existing items. Paragon includes some new items in every test. These items look the same as the scored items but they are not used to calculate your score. Paragon does not tell the test taker which questions will be unscored because it is important that test takers try their best for every item. This ensures that the data collected on the new items can be used to evaluate their quality. Only questions that have performed well will be used as scored items in the future.

While each test form contains different questions, each test form is constructed following explicit guidelines regarding content and difficulty. Paragon’s pre-testing and form creation procedures ensure that the different forms have approximately the same difficulty. However, the items are not identical from administration to administration, and this means that it is possible for there to be small differences in difficulty. It would be unfair to test takers if final test scores did not correct for these small differences. The process of score equating removes even this slight variation.

Score equating is the process of correcting final scores for slight variations in difficulty between test forms. For example, if one test taker correctly answers 30 questions on a relatively easy form and another test taker answers 30 questions on a more difficult form, equating will correct for the disparity in form difficulty. It is of utmost importance that the score reported for either of the test forms is comparable. We need to make sure that the final score reflects your underlying language proficiency and is not dependent on the difficulty of the questions you or someone else received.

The goal of any test is to provide a fair and accurate assessment of each test taker, regardless of the specific questions presented during the test. Though CELPIP tests are assembled following guidelines for content and difficulty, it is still possible for test forms to vary slightly in difficulty. Since a raw score is merely the sum of the questions that a test taker has correctly responded to, it cannot account for these slight variations in difficulty. Consequently, a raw score of 30 will not have the same meaning across different forms of a test. This means that different test takers’ raw scores could be hard to interpret and compare.

In order to account for differences between test forms, Paragon transforms test takers’ raw scores into a scaled score. Scaled scores adjust raw scores in a consistent way so that test takers’ scores on different test forms can be compared.

After the Reading and Listening scaled scores have been determined, they are transformed into a CELPIP levelThe transformation rules were established by English language experts who participated in a standard setting exercise. Standard setting is an extensive, research-based process. Language experts work with testing professionals to identify what language learners need to be able to do at each performance level, such as CLB 8. The experts then analyze the test in detail and determine what level of performance a test taker needs to demonstrate for each CELPIP level. This process has established a defensible link between each Reading and Listening scaled score and its corresponding CELPIP level.

There are many ways to measure reliability of a test. One good measure of reliability is Cronbach’s alpha, which specifically measures the internal consistency of a test form. The result of this statistical measure can range from -1 to +1 where -1 indicates a complete lack of internal consistency and +1 indicates perfect consistency. A result of 0.80 or higher is considered to be excellent. For both the CELPIP Reading and Listening components, the test forms have an average Cronbach’s alpha of 0.88. This indicates that CELPIP Reading and Listening test forms demonstrate excellent internal consistency.

CELPIP
When I took CELPIP, I found it was like speaking English in real life. You speak every day with your boss and with your friends, and the CELPIP Test represents those every-day, real-life language situations.
- Rafaela B., CELPIP Test Taker
CELPIP
I had taken other English language proficiency before, and CELPIP was more relatable to me. All of the questions were situations I was familiar with from daily life, and were like conversations I had experienced personally.
- Chrisna D., CELPIP Test Taker
paragon
A Subsidiary of:
ubc